# **Rhotics. New Data and Perspectives**

Edited by Lorenzo Spreafico Alessandro Vietti

Spreafico / Vietti (eds.) Rhotics. New Data and Perspectives

This book provides an insight into the patterns of variation and change of rhotics in different languages and from a variety of perspectives. It sheds light on the phonetics, the phonology, the sociolinguistics and the acquisition of /r/-sounds in languages as diverse as Dutch, English, French, German, Greek, Hebrew, Italian, Malayalam, Romanian, Saraiki, Slovak, Tyrolean and Washili Shingazidja, thus contributing to the discussion on the unity and uniqueness

of this group of sounds.

36,00 Euro

www.unibz.it/universitypress

Scripta Ladina Brixinensia dé fora da / Hrsg. / a cura di Paul Videsott Consei scientifich / Wissenschaftliches Komitee / Comitato scientifico Guntram Plangg (Innsbruck) Hans Goebl (Salzburg) Vol. I

# **Rhotics. New Data and Perspectives**

Edited by Lorenzo Spreafico Alessandro Vietti

Design: DOC.bz Printing: Dipdruck, Bruneck-Brunico

© 2013 by Bozen-Bolzano University Press Free University of Bozen-Bolzano All rights reserved 1 st edition www.unibz.it/universitypress

ISBN 978-88-6046-055-4 E-ISBN 978-88-6046-102-5

This work—excluding the cover and the quotations—is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

## Contents



## Acknowledgments

This volume contains a collection of papers presented at the conference *'r-atics-3. Phonetics, phonology, sociolinguistics and typology of rhotics,* which was held at the Free University of Bozen-Bolzano (FUB) on December 2nd and 3rd in 2011.

The idea for the conference was to continue the tradition established at previous *'r-atics* meetings in Nijmegen (2000) and Bruxelles (2002) providing a forum for the presentation and discussion of current research on rhotics. In this respect, we would like to acknowledge Didier Demolin, Roeland van Hout and Hans Van de Velde for allowing us to pick up the title and the concept of the *'r-atics* workshops.

The entire process of peer-reviewing for each paper was only possible thanks to an external group of anonymous referees who made numerous valuable suggestions, many of which have been incorporated into the final version of the book.

We are deeply grateful to the Language Study Unit of the FUB, which funded the conference as well as the publication of this book. We would also like to thank the Language Study Unit team for their outstanding support in organizing the conference and the bu,press staff for their assistance in preparing this book.

## Introduction

#### Alessandro Vietti & Lorenzo Spreafico

#### Preamble

Writing an introduction to a 'new' book on rhotics appears quite an awkward task, especially if one harbours hopes to present new data and to envisage perspectives on the topic, as the subtitle to the volume suggests. Is there really anything new about rhotics?

Even from a quick overview of the contributions collected, the answer is definitely positive. Although phoneticians, above all, have made great progress in understanding the articulatory, acoustic and perceptual characteristics of rhotics and their exceptional variation (Recasens & Espinosa 2007; Engstrand et al. 2007; Proctor 2009; Lawson et al. 2011), the /r/ family still remains an anomalous case as a class of sounds for many well-known reasons:


The papers collected in this book thus clearly represent a step further towards a better understanding of rhotics in at least two ways: firstly, new data are collected on /r/ in many non-European languages, some of them coming from poorly (or not at all) described languages; secondly, different disciplinary standpoints are taken up in order to capture the kaleidoscopic /r/ phenomenology.

The primary goal of having descriptions of many languages is to document how /r/ is articulated and varies within distinct phonological systems. A twofold secondary aim is (a) to establish an empirical base for cross-linguistic and typological comparisons (b) which in turn could be used as a benchmark to take stock of theories or generalizations of human spoken communication (language sound systems). As a consequence, this book brings together articles that examine various aspects of rhotics in fifteen languages (or language varieties), namely:


On the other hand, /r/ and related phenomena are captured under different theoretical and methodological perspectives, following the tradition of previous *'r-atics* workshops. Mechanisms and strategies of first (Syed) and second (van 't Veer) language acquisition, ultrasound-based comparison in bilinguals (Spreafico & Vietti), acoustic (Savu) and kinematic analysis of articulation of /r/ (Scobbie et al.; Hoole et al.), phonological interpretation of allophonic variation (Patin) or phonological processes (Cohen), socio-geographical representation of language variation under a diachronic angle (Van de Velde et al.; Sankoff & Blondeau; Romano), all taken together depict an enlightening and multifaceted image of r-sounds.

In the next section, the contributions are grouped according to the main perspective or scientific framework. The most insightful general questions emerging from the analysis are also reported and emphasized, in order to illustrate the range of transversal issues connecting papers to each other as well as connecting them all to less superficial issues related to the interaction between phonetics and phonology.

#### 2. Language acquisition and bilingualism

The three contributions that fall within the broad framework of (first and second) language acquisition and bilingualism are from Van 't Veer; Syed and Spreafico & Vietti.

The first paper by van 't Veer explores the hypothesis of /r/ being featurally underspecified or not specified at all for place of articulation. The author refers to data from a study published by Rose (2000, 2003; data are available on the CHILDES phonetic database) and contrasts them with typological and diachronic evidence in the literature. He reports on two different patterns of L1 acquisition by 2 children. The first seems to categorize French /r/ more in terms of intrinsic phonetic properties (namely as uvular fricative) and partly discarding the phonotactic distribution of the phone. The second child picked up phonotactic information more as adult speakers do, thus classifying /r/ as a rhotic, and consequently not specifying it for PoA. The author adds to Rose's analysis an explorative acoustic examination on a very limited set of tokens, aiming to compare the two speakers' productions and to search for differences in the acoustic output. The results point towards a similar production of /r/ in both children, therefore opening again a number of questions on the nature of dorsal /ʀ/ phonological representation. What information is more easily recoverable from the input in ambivalent phonemes, distributional or segmental? Could this case support, as the author suggests, a view of phonology as substance-free in which abstract representations are partly detached from acoustic information? Syed investigates the patterns of acquisition of English [ɹ] by Pakistani learners. The perceived phonetic distance is used to measure the similarity of English [ɹ] to the neighbouring sounds in the English inventory as well as in Saraiki consonant system. The Speech Learning Model's principle of equivalence classification (Flege 1995) is tested on a sample of 90 learners of English with varied competence and exposure to the L2. In accordance with the perceived distance between phones, a developmental pattern emerges from the analysis: English [ɹ] is acquired by learning to discriminate it from L2 [l] in the first place, then from L2 [w] and finally from Saraiki [r].

In their contribution, Spreafico & Vietti explore the articulatory properties of /r/ in simultaneous and sequential Tyrolean-Italian bilinguals. Using the ultrasound imaging technique, they examine whether adult bilinguals display different tongue shapes for rhotics in each language they speak and whether bilinguals' articulatory patterns in each language are similar to those used by almost monolingual speakers or not. The results show that very late sequential bilinguals (for the sake of simplicity read here 'almost monolinguals') do not present distinct lingual shapes for rhotics in the two languages, while the simultaneous bilinguals do. Moreover, inter-speaker comparison indicates that articulatory patterns for rhotics used by simultaneous bilinguals differ from those used by the very late sequential bilingual speakers who are used as control subjects. To sum up, late sequential speakers transfer their /r/ from L1 to L2, whereas simultaneous bilinguals distinguish rhotics in the two languages, even if their rhotics are articulatorily different from those of the late sequential speakers. From the study, further directions of investigations need to be pursued: the articulatory means of phonological contrast within and between languages in bilinguals, the complex intertwining between articulation, acoustics and perception, and finally the role of sociophonetic factors in /r/ variation in simultaneous bilinguals.

#### 3. Phonetics and phonology

Studies in the field of experimental phonetics play a major role in the structure of the book: three of them (Hoole et al.; Scobbie et al.; Baltazani & Nicolaidis) present innovative and insightful evidence on the articulation patterns of rhotics in German, French, Slovak, Malayalam and Greek using UTI, EMA and EPG data. The following two contributions (Savu; Rieira & Romero) provide an acoustic analysis of the effects of coarticulation on the structure of /r/ in Romanian and American English. The last two papers, belonging to this section, are more phonologically oriented: one proposes a CVCV phonology interpretation of /r/ allophonic variation in Washili Shingazidja, the other is an OT account for some idiosyncratic phonological processes in the loanword phonology of Hebrew.

Hoole et al.'s paper focuses on the kinematic properties of rhotics as a special case of gestural coordination of consonant with consonant and consonant with vowel. Two sets of EMA data are presented. In particular, the first study explores the characteristics of /kr/ clusters in German and French if compared to other obstruent-sonorant clusters, namely /kl/ and /kn/ clusters. The low overlap in plosive-rhotic clusters is discussed as a potential source of diachronic instability which could in turn be conducive to metathesis. Articulatory synthesis is also used to explain further the reason for the low overlapping.

The second study provides an analysis of syllabic liquids /l/ and /r/ in Slovak. To begin with, the kinematic properties of the liquids are examined, as a function of the position in the syllable, then an analysis of the articulatory coordination patterns is carried out. The remarkable results coming up from data examination are the following:


The authors discuss the phonological implications of the results by hypothesizing that syllabic consonants are typologically infrequent because they require a coordination pattern which is different from the default CV pattern. Consequently, in Slovak it is possible to have syllabic liquids because in absolute terms consonant-consonant coordination shows a low overlap, thus suspending "the basic principle of a continuous vocalic substrate with overlaid consonant constrictions". The general aim of the research is to study the emergence and development of sound patterns as a function of the patterns of articulatory coordination.

The contribution of Scobbie et al. is a high-speed ultrasound imaging investigation of the phonemic system of liquids in Malayalam, a Dravidian language spoken in southern India. Malayalam represents an interesting case study for many reasons: on the one hand there is a complex system of contrasts in the liquids based both on primary and a secondary articulation (clear-dark resonances), on the other hand it works as a 'natural laboratory' to assess the potentialities and limits of the UTI technique to detect basic lingual properties of phonological distinctiveness. In accordance with previous acoustic studies on Malayalam and instrumental articulatory research on Tamil and Kannada, they carefully document the system of contrasts in general and the ambivalent properties of the fifth liquid in particular. Exploring the static and dynamic characteristics of the five liquid phonemes, the authors raise a valuable range of questions and conjectures for future research. Among these, the following issues deserve to be mentioned:


Baltazani & Nicolaidis present an acoustic and articulatory (EPG) analysis of the Greek tap, which appears to be the dominant allophone of /r/ in many prosodic contexts (and precisely in /Cr/ and /rC/ clusters, between vowels, but also in singleton phrase and word initially). The presence of a vocalic element, together with the ballistic contact gesture, is interpreted here as an essential part of the sound structure of the rhotic (as in Savu's contribution), rather than as an effect of the gestural overlap between two consonants in CrV contexts (as suggested in Hoole et al.). Following this last line of reasoning, the effect of the overlapping might be the popping up of the vocalic nucleus between the two consonants, but the authors provide proof against this account, at least in Greek, observing the occurrence of a vocoid in absolute initial position (#rV), where there is no other consonant to overlap with. The acoustic measurements show that the vocalic elements are longer than the constriction phase and their vocalic quality reflect the formant values of their corresponding nuclear vowels, only more centralized. Integrating the acoustic investigation, the EPG data provide evidence for a classification of taps into two categories with a complete or an incomplete closure. This distinction could suggest a view of taps as steps in a continuum from prototypical (fortis) taps to lenis taps to more vocalic realizations, as in a ladder towards a potential language change from (trills to) taps to approximants.

In a similar way, Savu explores the phonetic structure of taps in Romanian. The author puts forward the hypothesis that the phase of constriction is surrounded by two vocalic elements (not one as in Baltazani & Nicolaidis), which she considers components of a tap and not as intrusive or epenthetic vowels. Thus, the structure of a tap is made up by a sequence like vocoid+constriction+vocoid, more evident in #rV, Cr and rC contexts. The primary aim of the study is to measure formant structure and duration of the vocalic elements in order to establish the range of variation. In addition, a secondary and original goal is to investigate a possible resemblance between the vocoids in Cr and rC contexts and those in VrV context. The preliminary results show that vocalic elements bordering the tap closure tend to approach the quality of the syllabic vowels, even if still positioning themselves in a mid-high central to front area. In order to further prove the structure of the tap, as the one proposed in the paper, more evidence should be added by (a) quantifying the coarticulation effects in VrV sequences, (b) observing the behaviour of taps in contexts when there are no vowels on either side (like #rC, CrC and Cr#) and they function as syllabic nucleus as in languages like Czech or Serbo-Croatian.

The role of a transitional vocalic element in Vr sequences is discussed within a different framework by Rieira & Romero. *Mutatis mutandis*, the hypothesis is again to prove that the vocoid should not be considered as a vocalic epenthesis, and consequently as the result of a phonological process, but rather as an unstable targetless transitional element affected by coarticulatory forces.

The study contains an acoustic analysis of Vr contexts in American English in slow and fast speech. In the first place, segmentation procedures, based on the identification of flexes in the formant curves, are used to divide the sequence into three components: the vowel, the transitional element, the rhotic consonant. Next, durational and formant structure information is measured for the three components, focusing on stressed monosyllables. As a final point, an ANOVA analysis is carried out to test for the hypothesis of variation of the three components in relation to speech rate and vowel contexts. The effect of coarticulation is confirmed by (a) the variation of the schwa-like element as a function of the context and (b) by the influence exerted by the speech rate (even if data in the latter case are reported only for one exemplar speaker).

The following two contributions aim at giving a phonological explanation to the somehow anomalous behaviour of rhotics in Washili Shingazidja and Modern Hebrew.

The first study by Patin provides a detailed description of /r/ allophonic distribution in Washili Shingazidja, a Bantu language spoken on Grande Comore (one of the five Comorian islands). The data are collected from a single speaker. In the basic allophonic pattern a trill [r] alternates to a tap [ɾ]: the trill appears in initial position (and, apparently, mainly in Arabic loanwords) and the tap in intervocalic position (also across a word boundary). The distributional scheme becomes complicated by the presence of a preceding consonant, which triggers a trill, or a syllable with no high tones, which favors an approximant. The overall allophonic variation is accounted for within the CVCV phonology framework. Basically, the author suggests that a trill in absolute initial position corresponds to an underlying geminate and offers three arguments in support of his hypothesis:


However, the author admits that the CVCV phonology preliminary interpretation fails to account for the whole distribution pattern, such as the tap realization before [i], or the occurrence of the trill between a consonant and [i]. In addition, probably a wider sample of speakers is needed to gain a clearer idea of the relative weight of loanwords in the phonological process.

The second study by Cohen begins with the observation that phenomena not supported by the native Hebrew grammar seem to occur when /r/ is involved in loanwords from English into Hebrew. In particular, two phonological processes, reduplication (which is morphologically productive) and metathesis (not systematic) are likely to interact with the presence of /r/. In the process of adaptation of /r/ in loanwords, exceptional (read not part of Hebrew phonology) prosodic phenomena appear: on the one hand /r/ is metathesised from coda to onset (e.g. *kornfleks* > *kronfleks*), on the other hand a pseudo-reduplication process move /r/ from onset to coda to create pseudo-reduplicative patterns, as in *proportsja* > *porportsja*.

The author proposes an account within Optimality Theory to give an explanation to what looks like the emergence of universal grammar constraints. He assumes the stratified lexicon hypothesis, according to which the lexicon is divided into a core and a periphery with partially different phonologies (Paradis & LaCharité 1997). Therefore, constraints which are relevant to explain loanword adaptation may not be applicable to native words phonology.

To explain metathesis, \*Coda-r (a sub-specification of \*Coda) is proposed. This constraint outranks Max, LinearityN/LW (native/loanwords) and \*Cx (no complex syllable margins) and move /r/ from coda to onset. The optimalistic explanation formulated is not totally satisfactory when pseudo-reduplication comes into question. In that case, the same set of constraints plus Redup does not produce the correct output, as in *proportsja* that should be \**proprotsjia* instead of the actual winner, which is *porportsjia* (with \*Coda-r violated). Even if arguable, the contribution raises a significant question: is really \*Coda-r a universal constraint? What kind of typological evidence do we have? It cannot be our ambition to answer these questions here, but a remarkable connection could be traced to the paper by Hoole et al., in which kinematic evidence for metathesis is reported as a consequence of low overlap in CrV sequences.

#### 4. Language variation and change

The papers contained in the last section deal with the social and geographical variation of /r/ in three different areas: Romano presents data on the variability of rhotics in Italy; Van de Velde et al. analyse geographical variation in a diachronic perspective on the Dutch dialect in Flanders; likewise Sankoff & Blondeau report on a sound change in progress in Montreal French. It should be noted, however, that Romano's and Sankoff & Blondeau's papers are (up to now) unpublished studies from the *'r-atics-2* conference, thus dating back to ten years ago. As they still represent valuable contributions and missing pieces of evidence in the debate on rhotics, the two articles find their natural place within the structure of the book. Romano's study is an accurate description of the allophonic distribution of /r/ in Standard Italian, as well as a detailed illustration of the socio-geographical variability of rhotics all over the Italian peninsula. The basic standard patterning is defined as an alternation of a trill and a tap, with the tap occurring only in intervocalic unstressed syllables (e.g. raro [raːɾo] < /ˈraro/ 'rare'), and the trill in the remaining contexts. Next, a wide range of coronal and dorsal variants are identified and classified as geographical (thus belonging to a geographical variety of Italian), social, idiolectal or pathological. As underlined by the author, further research on the articulation of /r/ in Italian as well as on the sociolinguistic meanings attached to rhotics is still needed (ten years ago, as today).

The last two papers examine the problematic sound change from apical to uvular /r/.

Van de Velde, Tops & van Hout discuss the socio-geographical spreading of uvular /r/ in Flemish Dutch over a span of almost ninety years (from 1922 to 2009). The authors analyse three sets of data, two coming from traditional dialectal surveys and one collected with a more sociolinguistic approach. The combination of geographical and social methods proves to be an excellent instrument to capture the dynamics of a sound change. In our opinion, to implement geographically-based rapid and anonymous surveys could represent a new perspective for a multidimensional documentation of language variation and change in Europe. This could be especially true if we aim to re-draw a map of the spreading of uvular /r/ across Western Europe. Coming back to the contribution, the results show an ongoing change from apical to uvular /r/ in the Flanders, in particular among the younger generations. Interestingly enough, it must be remarked that in a context of considerable social-geographical variability (e.g. twelve variants are registered in the RAS study), individual speakers are not likely to alternate front and back rhotics.

A similar finding is described in Sankoff & Blondeau's paper, reporting a sound change in progress in Montreal French from apical to uvular /r/. These strands of independent evidence (supported also by the study of Vietti & Spreafico 2008) seem to lead to the conclusion that some sound changes, at least at the individual grammar level, ought to be categorical, while others, like vowels shifts for instance (see recently Harrington 2006), have to be incremental in their nature. In their paper, Sankoff & Blondeau analyse in particular the sociolinguistic behaviour of two speakers that show a pattern of [r]-[ʀ] variation (contained in the interval between 20 and 80 %), in order to understand phonological and stylistic factors of conditioning. The process of change from a variable to a categorical use of [ʀ] passes through a phase of prosodic conditioning that favours the occurrence of uvular /r/ in syllable coda. On the other hand, it still remains unclear what stylistic reasons are affecting speakers' choice towards apical or uvular /r/.

#### 5. On the vital importance of being variable

As this introduction illustrates, the book's aim was not to unravel the complex question of the phonological unity of /r/, but rather to purposefully pursue the empiricist idea of offering data-based descriptions of /r/ in many different languages and, possibly, from different theoretical angles. Therefore, the collection of papers taken as a whole reflects the understanding that in order to explain the variability of /r/ a broad cross-linguistic framework is needed (as proposed in Lindau 1985 as an example). In addition, most of the papers share another basic feature of the empirical view, namely the experimental context of data collection and the instrumental method of analysis.

Thus, if the aim appears to be very elementary in its nature, the combination of experimental method and cross-linguistic perspective may nevertheless lead to some important consequences for a phonology of rhotics. First, the evidence coming from articulatory data (notably EMA and UTI) shed a new light on the characteristics of the class of rhotics both in terms of static configuration and dynamic behavior (see for instance the coordination patterns of rhotics in consonant clusters). Moreover, and especially regarding UTI, the rich representation of lingual shapes and movements implies a change in the received categories of the sounds' articulation, and consequently it fosters the reformulation of the current terminology.

Second, the adoption of a cross-linguistic framework has several advantages, which in a very straightforward way allows us to:


Taken all together, the three points address the general topic of the role of within system /r/ variability as a constant component in the sound systems of the world's languages, thus showing the importance of such a variable class of sounds as a functional and vital element in a fully fledged phonological system. As this introduction attests, the book raises many issues. We hope these issues will be as much a source of inspiration to everybody working on rhotics as they have been to us.

#### References


# Part I

Language acquisition and bilingualism

## On the Place of Rhotics: A case study on the acquisition of French /ʁ/

#### Marijn van 't Veer, Leiden University Centre for Linguistics

#### Abstract

In this paper, we discuss the acquisition of /ʁ/ for two children acquiring French, for one of whom, /ʁ/ triggers within-cluster assimilation of coronal obstruents. This is conspicuous, as French has a placeless rhotic. Accordingly, the rhotic of the other child is the target of place assimilation. Rose (2000, 2003) attributes the difference to the fact that the French rhotic is phonetically fricative-like, whereas it behaves – phonotactically – like a liquid. Hence, two possible sources of information for the acquiring child contradict each other. We discuss cross-linguistic evidence for and against place-bearing rhotics, concluding that both possibilities exist. To see to what degree the /ʁ/ is the same in the two children, we present an acoustic study, after which we demonstrate a reconstruction of the possible path of acquisition of Théo. Finally, we discuss the relevance of phonetic measurement for phonological patterns.

#### 1. Introduction

In deciding which features to use when storing words and their segments, children must reconcile multiple sources of evidence. For one thing, phonetic similarity and distributional properties play an important role (Maye & Gerken 2000; Maye Werker & Gerken 2002; Maye & Weiss 2003). On the other hand, we know that children are sensitive to the phonotactic patterns of their surrounding language from the age of nine months (Saffran & Thiessen 2003). In some instances, these two sources provide contradictory cues.

One such case is French /ʁ/. Phonetically a fricative, or at least very fricativelike (see, for example, Rose 2000:8), phonotactically it patterns with the other liquid in the language, /l/. Thus, learners of French must find some way to weigh these two conflicting sources of evidence in such a way as to arrive at an adult-like grammar.

It is not surprising that children have difficulty with this. Rose (2000) describes two learners of French, Clara and Théo, who have differing acquisition patterns when it comes to /ʁ/. What is especially striking is that Théo's /Cʁ/ onset clusters display a very robust pattern of cluster-internal dorsal assimilation, where the /ʁ/ is the trigger and coronal obstruents are targets. This is a highly remarkable pattern, because cross-linguistically, there are very few cases where rhotics are specified for place of articulation, let alone where they trigger assimilation. In his analysis of these data, Rose (2000: chapter 5) proposes that the differences between the development of /ʁ/ observed in the two children stem from different underlying representations: Clara's rhotic is placeless, whereas Théo has posited an underlying feature [dorsal] for his /ʁ/1 . Rose (2000) attributes this difference to the phonetics of French /ʁ/, namely that it is a uvular across the board (Rose 2000:244-5, 261), and uvular consonants can be analyzed as [dorsal] (Rice 2011). This idea is further expanded upon in Rose (2003), where the author points to the fact that in adult (Québec) French, the rhotic often surfaces as a uvular fricative in branching onsets (where the head is a voiceless obstruent).

Based on the data observed and the analyses proposed in Rose (2000, 2003), it would appear a viable option that Clara's initial hypothesis is that /ʁ/ is a liquid, whereas Théo's initial hypothesis might be that it is an obstruent. This would imply that Clara places more emphasis on the phonotactic evidence, and Théo more on the phonetic evidence (see also Rose 2003:428). In this paper, we will attempt to see if we can find evidence for different representations in the acoustic signature of the rhotics of both children. We will follow Rose's hypothesis that the phonetics of French /ʁ/ contradict its phonotactic distribution, and that this is the reason for the difference. Whereas Rose (2000, 2003) focuses mainly on the phonetics of Place of Articulation, we will also consider the manner specification of /ʁ/ in the respective grammars of both children.

In the next section, we will briefly go over some typological data to see whether we can find cross-linguistic evidence for either placeless or place-bearing rhotics. In section 3, we will show the acquisition pattern of the two children's rhotics in more detail. Section 4 presents a tentative acoustic study, and section 5 concludes.

<sup>1</sup> It should be noted that there are more differences between the two children regarding their /ʁ/; reasons of space prevent us to go into much more detail, but see section 3 below, and Rose (2000) for a full description.

## 2. A typology of rhotics and Place of Articulation

Liquids are among the most elusive and difficult-to-understand of all phonemes. With regard to rhotics, phonological discussion focuses mainly on the feature specification. One area of disagreement is whether rhotics have a Place of Articulation (henceforth: PoA) specification. The most radical proponent of rhotic PoA is Walsh Dickey (1997), who goes so far as to say that rhotics are universally defined by a specific PoA specification (that is, all rhotics have a secondary laminal node). The other position has less radical proponents, but deserves consideration nonetheless. In the following sections, we will review some evidence for and against both positions. If not indicated otherwise, the examples are from synchronic phonology.

#### 2.1 Placeless rhotics

Despite Walsh Dickey's proposal, rhotic placelessness appears to be the default position in the literature. In this section, we will review some of the reasons why this is so.

A general indicator of the presence of a feature in an underlying representation is that the segment it belongs to displays some phonological behavior (the 'natural class' argument). Hence, we will look at the phonological behavior of rhotics. In general, rhotics do not trigger any alternations involving PoA (but see section 2.2 for some counter examples). In addition, they often escape rules that otherwise trigger PoA, such as coda place assimilation:

#### (1) Place assimilation in Italian

a. Diachronic obstruent-obstruent assimilation: cognates


	- c. Liquids in codas


As 1c illustrates, both laterals and rhotics escape the coda condition on place. Something similar arises if we consider the distributional properties of the word-final consonant in Italian; in general, Italian has no word-final codas. However, in some function words, such as *per* 'for, to, through', consonants appear word-finally. Most of these cases involve liquids. We can attribute both patterns to a general ban on independent PoA in codas; the ban does not apply to rhotics because they are placeless.

Another indicator for placelessness is found in Backley (2011), who notes that the rhotic in (British) English stands in a similar relation to /a/ as /i/ to /j/ and /u/ to /w/, in that they group together in glide/liquid alternations. Backley proposes that the rhotic in English is a glide, and its vocalic counterpart is /a/. This low vowel is usually thought of to be underspecified for place (Hall 2011), and by analogy, the same would hold for the rhotic.

In a study of onset cluster phonotactics in Germanic languages, Goad & Rose (2004) note an asymmetry in the distribution of clusters where the obstruent is coronal. Consider the onset cluster inventory of Dutch, for example, in Table 1. Laterals can cluster with coronal fricatives, but not stops, and rhotics can cluster with coronal stops, but not fricatives. The analysis that Goad & Rose (2004) propose rests on two points: laterals are [coronal], whereas rhotics are placeless. Second, there is a difference between 'real' onset clusters and 'apparent' onset clusters: whereas most obstruents in onset clusters are actually in the onset constituent, this does not apply to /s/, which is syllabified as an appendix2 . The absence of /tl/ in this inventory, Goad & Rose (2004) argue, is due to a restriction on identical places of articulation in a cluster (/pʋ/ onset clusters are also banned in Dutch, as are /kx/ clusters; the same constraint holds in German and English). If so, the reason /tr/ is licit must be that the rhotic is not coronal. The ungrammaticality of /sr/ clusters is due to the fact that an appendix must be licensed, and can only be licensed by a 'strong' onset. The rhotic is not strong enough, because it lacks a place specification.


Table 1 – Onset cluster inventory of Dutch.

<sup>2</sup> The strange behavior of sC clusters is one of the most famous problems in phonology. See Goad (2011) for an overview.

A similar situation holds for English (which also bans /θl–/) and German (the German case is slightly different as the language allows for /ʃr/ onset clusters. See Goad & Rose (2004: section 4.3) for details – which however do not change the analysis of the rhotic as placeless.

Rose & Demuth (2006) show that in English and Afrikaans loan word adaptation in Sesotho, epenthetic vowels breaking up illegal onset clusters obtain their place of articulation from their leftward environment. In wordinitial context, the left environment is constituted by the first member of the cluster – in other words, a consonant. Word-medially, where there is a vowel to the left of the cluster, it is the vowel that supplies the feature. A number of exceptions to this pattern exist, however. First, dorsal consonants do not supply a place of articulation to the vowel. Secondly, the vowel /ɑ/ is only copied if no other source is available (Rose & Demuth 2006: section 3.3). What is most important for our present purposes is that neither /l/ nor /r/ ever supply a place of articulation feature to an epenthetic vowel. In (2) there are some examples3 .

(2) Epenthetic vowels in Sesotho loanword adaptation


	- c. Word-medial clusters: left-to-right from vowel



<sup>3</sup> The examples are all taken from Rose & Demuth (2006). I have adopted their transcriptions.

It would be going too far at this point to reproduce the entire analysis developed in Rose & Demuth (2006). What is important, however, is that the rhotic in Sesotho (and the lateral) behaves as a placeless segment in loan word adaptation, as we can see in example (2e).

A final example of an argument comes from van Oostendorp (2001). Discussing two dialects of Dutch, van Oostendorp (2001) argues that /r/ is not only placeless, but actually featurally empty. In the brabantic dialect of Tilburg, /r/ patterns with fricatives word-finally, and with sonorants elsewhere. More directly related to place, in Maasbracht Dutch (Limburg), a contrast exists between falling and 'dragging' tone. Tone is realized on the main stressed vowel of the word, but minimal pairs exist only for rimes consisting of long vowels and vowels followed by sonorants:

(3) Tonal minimal pairs in Maasbracht Dutch


b. Obstruent-final rimes


This contrast exists because sonorants can be moraic, whereas obstruents cannot, and falling tone is represented by a single high tone on the nucleus, whereas dragging tone consists of two high tone features. Rhotics display a dual behavior. Word-internally they pattern with sonorants, in that rhotic-final rimes can have falling as well as dragging tone, but word-finally, they behave as obstruents: no tonal contrast exists.

	- a. Word-internal rhotic final rimes


b. Word-final rhotic-final rimes


In this respect, rhotics pattern exactly like /ŋ/, the placeless nasal (see van Oostendorp 2001 for references, and Rice 1996 for arguments for the placelessness of /ŋ/).

#### 2.2 Rhotics with PoA

Our first example in which rhotics display evidence of a place of articulation feature comes from Selayarese (Mithun & Basri 1986). In morphological reduplication, word-final velar nasals assimilate to the adjacent onset. Consider the examples in (5)4 :

(5) Reduplication in Selayarese


The telling example here is the final one, in which the nasal surfaces as a coronal if followed by the rhotic. The pattern holds over word boundaries, as can be seen in the following examples involving the numeral *anna*ŋ'six'.

(6) annam poke annan tau annaɲjaraŋ annaŋ golo annan rupa

Thus, Selayarese exemplifies the possibility for rhotics not only to have a PoA, but an active one, too.

Selararese is not alone in this respect; Chukchee (Lewis 2009) also has a pattern wherein a velar nasal assimilates to the following onset. Blevins (1994) gives the following examples:

<sup>4</sup> Apart from the role of the rhotics, this process is also interesting in connection to van Oostendorp (2001)'s argument for phonological placelessness of ŋ. See also Rice (1996) on the relation between coronals and velars, and why they are both often seen as 'unmarked' or 'default'; one possible alternative analysis to the one proposed here is that the patterns in 5, 6 and 7 are the result of two interpretations of the same underlying, placeless consonant. However, this would leave unexplained the fact that the rhotic patterns with the other coronals.

(7) Nasal assimilation in Chukchee


Again, we see that the rhotic triggers the phonologically placeless nasal to surface as a coronal, whereas its default surface form (as can be seen in the first example, teŋ-əl̥ʔ-ən) is velar.

The examples from Selayarese and Chukchee involve a primary coronal place of articulation for the rhotic in these languages, but Sanskrit presents us with an example of a language in which the rhotic has an active secondary PoA feature, namely through the process of retroflexion (see 8):

(8) n → ṇ / {ṣ, r}[...]

That is, the rhotic patterns with the retroflex fricative in triggering retroflexion on coronal nasals. Some examples are given in 9 (from Avery & Rice 1989):

(9) Retroflex harmony in Sanskrit


The examples in 9a and 9b show that vowels and consonants respectively are transparent to retroflex harmony (perhaps unsurprisingly). The examples in 9c, however, show that not all consonants are transparent: coronals block harmony. Hence, a straightforward hypothesis is that the triggers are coronals with a secondary feature [retroflex]. This entails that the rhotic in Sanskrit is a coronal. Finally, let us investigate an example where the PoA feature is not coronal, but something else (possibly [back]), and where the evidence is diachronic rather than synchronic. In Old English Breaking, front vowels underwent a diachronic process of diphthongization when followed by [back] consonants (Baker 2007; Barber 1997, among many others):

(10) Old-English Breaking description

[a] or [ɑ] → [æɑ] or [ɛɘ] + consonant [e] or [ɛ] → [eo] or [eʊ] / + continuous + C [i] or [ɪ] → [ɪʊ] + back { {

Example 11 lists a number of Proto-Germanic words, and their Old English counterparts. The pattern should by now be clear: the rhotic participates in a process that is best characterized by crucial reference to a PoA feature (as in 10 above).



#### 2.3 Summary

In this section, we have looked at a number of languages and a number of reasons why rhotics should be either placeless or place-bearing phonologically. In the end, evidence can be found for either position. However, placelessness seems to be the default, as cases in which rhotics are active in place-related phonological processes (either diachronic or synchronic) are rare (see also Rose 2000, 2003). In the next section, we will examine the acquisition patterns of two learners of French, who show remarkably different patterns when it comes to the rhotic. The case at hand, in which the child exhibits evidence of a place-bearing rhotic when the surrounding language does not, raises the question of whether the child acquired the sound as a rhotic in the first place. These questions are especially relevant with respect to French, with its fricative-like rhotic.

#### 3. The acquisition patterns of Clara and Théo

In this paper, we investigate the phonetic contours of the rhotics of two learners of Québec French: Clara and Théo (Rose 2000)5 , who, for all intents and purposes of the present study, are acquiring the same (Eastern) dialect of Québécois. The segmental inventory of Québécois French is, as far as consonants are concerned, identical to the segment inventory of European French (Rose 2000). The primary data consist of spontaneous speech, recorded roughly bi-weekly from

<sup>5</sup> The data and software used in this study are freely available from PhonBank, see http://childes.psy.cmu. edu/phon/.

age 1;00.27 to 2;08.19 for Clara, and from age 1;10.26 to age 4;00.00 for Théo. They were first published in Rose (2000), and the observations and examples given in this section rely heavily on that work.

Cross-linguistically, as we have seen, rhotics may either bear a place feature, or they may not, where the latter is the unmarked situation. In the Goad/Rose corpus of Québec French, we see this variation exemplified. Clara appears to represent her /ʁ/ as a placeless liquid from the beginning. She does not show any behavior that would indicate otherwise. On the other hand, Théo seems to go for the dorsal option. This, as shown by Rose (2000, 2003), we see is evidenced in a pattern of dorsal assimilation in branching onsets where /ʁ/ combines with a coronal, throughout the entire period for which Théo was recorded.


As becomes clear from these examples, Théo's rhotic has a dorsal place feature that triggers assimilation of coronal obstruents in the same onset constituent. There is, however, no phonological evidence (e.g. from spreading, blocking, or other phonological phenomena) for any place feature in rhotics in the surrounding language. Either his /ʁ/ is phonologically a rhotic with a [dorsal] feature, or it is a dorsal fricative with peculiar phonotactic properties. In the first case, Théo overspecifies his /ʁ/6 , in the second case, he violates a phonotactic rule of French: there are no stop-fricative onset clusters7 .

<sup>6</sup> But see Hale & Reiss (2003) for an argument why this could be expected.

<sup>7</sup> Save for some learned exceptions such as *psaume* 'psalm', *psychologie* 'psychology', which are very small in number.

One clue comes from looking at the timeline of development of Théo's /ʁ/. The first instance of /ʁ/ *per se* occurs relatively late, but the segment surfaces target-like from its inception. It occurs in word-final position during the same period when other consonants do so. In contrast to Théo's rhotics, which trigger assimilation in clusters, Clara's rhotic undergoes place assimilation in singleton onsets in early sessions:

(13) Non-adjacent place assimilation


Although superficially this is very similar to patterns of Consonant Harmony she displays, the timeline is not identical, and furthermore, in the case of /ʁ/, there is no directionality restriction (that is, /ʁ/ can receive its PoA from either the left or the right). This, Rose (2000) proposes, is because Clara's rhotic is devoid of any PoA of independently, reflecting the cross-linguistically unmarked case. Clara's /ʁ/ behaves in all respects like a rhotic, whereas Théo's represents the dual identity of the segment in the environment language. This begs the question of what underlying representation Théo has, other than the obvious [dorsal] feature, particularly in terms of Manner features.

#### 4. A tentative acoustic study

The different acquisition patterns of Clara and Théo closely resemble the dual nature of the French rhotic: it is both liquid-like and fricative-like. It is especially interesting why Théo would posit a place-bearing rhotic, since there is no phonological evidence (e.g. from spreading) for this in his input, even though the acoustic evidence is potentially misleading – although we have seen that place – bearing rhotics are cross-linguistically not ruled out. Thus, the question arises whether Théo is acquiring a rhotic in the phonological sense, or whether he is hypothesizing a fricative with highly marked phonotactic properties. Hence, we set out to investigate the acoustic characteristics of both children's rhotics.

#### 4.1 Items

From both Clara and Théo, 40 tokens of faithfully produced prevocalic8 rhotics were selected, from both singleton and cluster onsets. Since the recordings are all of spontaneous speech, and made in a living room situation, not all tokens were suitable. Unsuitable tokens were those in which, during the period in which the rhotic was uttered, another voice was audible, background noise was present, someone present at the session apparently touched or breathed into the microphone, or where microphone hum was unacceptable. In order to avoid an uneven representation of rhotics produced in single words, the number of tokens from the same lexical item was limited to three. All tokens were studied in Praat (Boersma & Weenink 2012).

#### 4.2 Criteria

Six criteria were used, from general to more rhotic-specific. These are listed below, along with a brief description of how they were applied. Some of the measures involve the degree to which the segment is 'sonorant-like', mostly with respect to voicing (voice and harmonics-to-noise ratio HNR). Measuring trillness, of course, is specific to rhotics. Two measures of PoA were also taken, as we might assume that a phonological specification of [dorsal] in Théo's case might lead to a smaller standard deviation (because a phonological target is present).

*Length*. The delimitations of each item was measured by exclusion; that is, the end of the section before the rhotic was determined, as was the start of the section after the rhotic. The remaining section was designated as 'rhotic'. It was expected that this exclusive criterion would provide more objective results than any inclusive criterion.

*Voice*. Whether a given token is voiced was determined on the basis of the presence or absence of voice bar throughout the duration of the rhotic. For the purposes of the present study, voicing is treated as a binary variable.

*Trillness*. Even in a language like French, with its fricative-like rhotic, some tokens involve a Bernoulli effect induced pulse stemming from the uvula hitting the tongue root. In the current study, each token was inspected both impressionalistically and spectographically to see whether such pulses are present. 'Trillness' is treated as a binary variable.

*HNR*. The harmonics-to-noise ratio is a measure of the amount of energy in the signal that is present in harmonics relative to the amount of energy in the signal that is not; in other words, it measures the 'fricativity' of a given auditory segment.

<sup>8</sup> For the purposes of the present study, pre-glide rhotics were also included.

*F3*. A characteristic of both apical and uvular trills is that they induce lowering of the third formant (Ladefoged & Maddieson 1996). As children's voices are different from adult voices, filters in Praat were adjusted to the following settings prior to performing measurements: the number of formants to look for was limited to three, and in the spectral filter the window length was set to 0.0025.

*Center of Gravity*. For those items for which no F3 could be measured, because there was not enough formant structure, the Center of Gravity (henceforth: COG) was measured instead. The COG takes into account the energy distribution of noise and determines where it is centered. Hence, it is a measure of relative backness and frontness, whereby a higher COG corresponds to a more forward PoA.

#### 4.3 Results

The power to extrapolate conclusions from any kind of statistical test on data from two subjects is extremely limited. The results derived from the current study should therefore be treated as indications rather than conclusions. Having said that, the most apt test for these data is the Mann-Whitney U-test, an alternative to the t-test that is non-parametric and allows for unequal samples. For the criteria for which binary measures were performed, a χ<sup>2</sup> -test was applied. *Length*. On the whole, Clara's rhotics are somewhat longer than Théo's: 17.37 ms vs. 14.42 ms. On the other hand, she also has a larger standard deviation: 8.32 ms vs. 5.17 ms. A Mann-Whitney U-test yielded no significant result (z=1.49, p>.5).

*Voice*. The number of voiced tokens in Clara's sample is much higher (26) than in Théo's (12). This translates to a proportion of .33 for Théo and .66 for Clara. A χ<sup>2</sup> -test was significant: χ<sup>2</sup> =7.0413, p<.01.

*Trillness*. Although the French rhotic is not necessarily known as a trill, trilled tokens do occur. There were 11 in Théo's example (proportion: .31), and 15 in Clara's (proportion: .39). This does not translate to a significant result in the χ2 -test: χ<sup>2</sup> =.2265, p>.5.

*HNR*. The mean HNR for Clara's rhotics in this study is 6.7601 dB (SD: 3.9355), whereas Théo's mean is 3.3593 dB (SD: 4.1776). This corresponds to a significant difference in the Mann-Whitney U-test: Z=1.7, p<.05.

*F3*. Théo's sample rhotics are produced with a mean F3 of 4213.86 Hz (SD: 243.38), and Clara's sample has a mean of 4304.28 Hz (SD: 350.91). The Mann-Whitney U-test yielded no significance: Z=1.25, P>.1.

*COG*. The center of gravity in Clara's sample has a mean of 1272.29 Hz (SD: 565.88). For Théo's sample, the center of gravity is somewhat higher: 1644.14 Hz (SD: 817.99). This is a non-significant difference in the Mann-Whitney U-test: Z=-1.1, p>.1.

In this section, we looked at the acoustic characteristics of the rhotic productions of Clara and Théo, two children acquiring Québecois French. The two children display markedly different acquisition patterns, which correlate with either the phonotactic (Clara) or phonetic (Théo) identities of /ʁ/. For the six criteria we applied, significant differences were found only for voicing and HNR. The findings are summarized in Table 2. The results are not unequivocal. I take this to mean that the children are aware of, and struggling with, the dual identity of /ʁ/. In the next section, we will discuss some of the implications of this study.


Table 2 – Summary of the results.

#### 5. Discussion

The multitude of ways in which rhotics manifest themselves, both phonetically and phonologically, have puzzled many linguists. As we have seen, children also struggle with rhotics during the course of phonological acquisition: Théo's /ʁ/ triggers dorsal assimilation when it combines with coronal obstruents in onset clusters. On the other hand, the French rhotic is remarkably fricative-like in its acoustic signature, which could have caused the child to parse it as an obstruent. A dorsal fricative is, of course, much less remarkable. Thus, the dual nature of the French rhotic appears to be a cause for confusion.

Another case of dual nature is Positional Lateral Gliding as described in Inkelas & Rose (2008). Inkelas & Rose describe a pattern in the phonology of E., a child acquiring (American) English, who during a certain period does not produce faithful tokens of /l/. Instead, E. substitutes the glides /j/ and /w/, but not in a random way: in 'strong' positions (onsets of words and stressed syllables), E. substitutes /j/, whereas in 'weak' positions (onsets of unstressed syllables, codas), /w/ is inserted. This is interesting for our present case, because /l/, like /ʁ/, has a dual nature – albeit in a different way. Whereas in the case of rhotic, there is a conflict between its phonetic contour and its distributional properties, the duality in the lateral lies in the fact that it involves both a coronal and a dorsal gesture9 . In the case of E., the dual nature of /l/ manifests itself in the grammaticalized patterns of a single child, whereas what we see here is that two children each opt for a different route – albeit not with full confidence. It would be interesting to investigate the data from more children acquiring Québécois French, to see where Clara and Théo fit into general picture.

Théo's grammar undoubtedly is not in the adult stage. Although he knows the features of his language, some fine tuning must ensue. If it is true that Théo's /ʁ/ is indeed a dorsal fricative, he has two options: either he must live with the fact that one of the fricatives of his language has phonotactic properties different from the others, or he must revise the featural make-up of the segment. If indeed he knows the sound is a liquid, no such revision is necessary. However, in all cases, he must stop assimilating onset clusters. Again, two possibilities exist: either he must drop the dorsal specification, or 'unlearn' the rule that enforces the assimilation. We do not have data from the moment at which assimilation ceased, but we know it did so shortly after the final recording (Rose 2000:238, footnote 3).

We set out to investigate whether Théo and Clara had different phonetic rhotics, because Théo's acquisition pattern of the segment is markedly un-rhotic-like. We were unable to find conclusive evidence, and so we cannot know with certainty what the right answer is. We can, however, attempt an informed speculation as to the scenario: given the hypothesis that children appear to adhere to suprasegmental structures to the extent that these form the basis of their substitution patterns, not only exemplified by E. as described above, but by many others as well (Chiat 1989; Pater 1997; Rvachew & Andrews 2002; Marshall & Chiat 2003), and given the fact that children are sensitive to their native language's phonotactics from a very early age (Saffran & Thiessen 2003), given that crosslinguistically, the option exists, and given that the acoustic study, with all its limits and caveats in mind, gives no conclusive evidence to the contrary, I propose that Théo's /ʁ/ is, in fact, a rhotic – even if it is a place-bearing one.

In this study, we examined the place of articulation specification of rhotics in a number of ways; we considered typological and diachronic evidence and corpus evidence from acquisition (as presented in Rose 2000). Finally, we set out to perform acoustic tests over the production data from two children who appear to have different underlying representations for their rhotics, presumably

<sup>9</sup> Incidentally, the substitution pattern of E. closely follows the distribution of /l/ in his surrounding language (English): light /l/ in onsets, dark /l/ in codas. E. generalizes this pattern to strong and weak positions.

stemming from the segment's dualistic nature in their surrounding language. No conclusive evidence could be found in the acoustic measurements; only some criteria reached significance, and in the case of the PoA-criteria, the asymmetry goes in opposite directions in the two tests. Of course, it is possible that with a larger sample of children, and a larger sample of items, a more unequivocal picture would arise. On the other hand, it is very possible that the fact that no conclusive acoustic difference could be found between two subjects - who show phonological evidence of having different underlying representations - is simply a reflection of the fact that the acoustic input is the same for both children. In this sense, the current results are an illustration of the observation that phonetic measuring cannot always probe into phonological representation.

The non-idiosyncratic relation between phonetics and phonology has been pointed out in many previous publications, perhaps most strongly in Substance-Free phonology (see Hale & Reiss 2008, for example). Furthermore, in their largescale overview of studies on the acquisition of artificial phonological grammars, Moreton & Pater (to appear) find very little evidence for phonetic complexity as a factor in determining learnability. Rather, they show that structural (featural) complexity is a much better predictor of the relative difficulty of a learning task. In the current study and the works on which it builds (Rose 2000, 2003), the learners' systems are accredited with a certain degree of abstraction. That is, learners construct their representations not based on acoustic information only – which is in line with the conclusions in Moreton & Pater (to appear). The current finding that different underlying representations do not necessarily lead to different acoustic signatures may actually reinforce the idea that phonological learning is abstract to a fairly high degree. In fact, when a child such as Théo apparently has difficulty integrating acoustic and phonotactic/distributional evidence, the effects are seen in the phonological behavior rather than in the corresponding phonetics.

#### Acknowledgements

Much of the work reported here was done in collaboration with the students attending the 2010 *sound and sound structure* class at the Department of Linguistics of Leiden University: Matthias Franken, Saskia Lensink, Renée Middelburg, Nina Ouddeken and Marijn Verschuure. I am also indebted to advice from Jessie Nixon, Kathrin Linke and Allison Kirk, and feedback from the audiences of the *'r-atics-3* conference in Bozen-Bolzano, 2011, and the ninth *Old World Conference in Phonology*, 2012, Berlin, and from Marc van Oostendorp and Yvan Rose. Of course, all errors are my own.

#### References


of difficult phonetic contrasts. In Barbara Beachley, Amanda Brown & Frances Conlin (eds.), *Proceedings of the 27th annual Boston University Conference on Language Development*, 508-518*.* Somerville: Cascadilla Press.


van Oostendorp, Marc. 2001. The phonology of postvocalic /r/ in Brabant Dutch and Limburg Dutch. In Hans Van de Velde & Roeland van Hout (eds.), *'r-atics. Sociolinguistic phonetic and phonological characteristics of /r/*, 113-122. Bruxelles: ILVP.

Walsh Dickey, Laura. 1997. *The phonology of liquids.* PhD thesis, University of Massachusets, Amherst.

## Acquisition of English [] by adult Pakistani learners

#### Nasir A. Syed, University of Essex

#### Abstract

The paper is based on perception and production tests conducted with 90 adult Pakistani learners of English with the aim to study their acquisition of English [ɹ]. The study is conducted in the SLM paradigm hypothesizing that learnability of an L2 sound is proportional to the perceived phonetic distance between the target L2 and the corresponding L1 sound. The results show that Pakistani learners can discriminate English [ɹ] from [w] and [l] but they develop strong equivalence classification between English [ɹ] and the L1 [r] in their L2 phonemic inventory.

#### 1. Theoretical background

Various models have been developed to account for acquisition of L2 sounds by adult learners. The Speech Learning Model (hereafter SLM) by Flege (1995) is one such model which particularly focuses on advance/experienced learners (Best & Tyler 2007). The model predicts a correspondence between perception and production of L2 sounds. According to the SLM, L2 learners produce sounds of an L2 in the way they perceive them (Flege 1995:239). The model further predicts that if a particular sound of the L2 is perceived by L2 learners as different from the closest L1/L2 sound(s), a new phonetic category is developed by the learners for the L2 sound. But, if they cannot perceive a difference between an L2 and the closest L1 (or L2) sound, equivalence classification between the two sounds (where two sounds are equated to each other) takes place which blocks the establishment of separate phonetic representation for the L2 sound. According to Flege (1995), learnability of an L2 sound is proportional to the perceived phonetic distance between the L2 sound and the closest sound(s) of either the L1 or L2. The SLM provides seven hypotheses which predict learning outcomes in different contexts. Out of those, 3 hypotheses which are related to the current study are reproduced below from Flege (1995:239):

1. "A new phonetic category can be established for an L2 sound that differs phonetically from the closest L1 sound if bilinguals discern at least some of the phonetic differences between the L1 and L2 sounds."

2. "The greater the perceived phonetic dissimilarity between an L2 sound and the closest L1 sound the more likely it is that phonetic differences between the sounds will be discerned." 3. "Category formation for an L2 sound may be blocked by the mechanism of equivalence classification. When this happens, a single phonetic category will be used to process perceptually linked L1 and L2 sounds (diaphones). Eventually, the diaphones will resemble one another in production."

Studies conducted in the SLM paradigm normally use 'goodness of fit' tests arranged with either monolinguals or early stage adult L2 learners to gauge how similar or different an L2 sound is from the closest L1 or L2 sounds. On the basis of such tests, perceptual mapping of L2 sounds in the phonemic inventory of learners is determined and predictions about expected learning pattern are made. For example, Guion et al. (2000) conducted an experiment with inexperienced Japanese learners of English to determine perceptual mapping of the Japanese learners for English consonants. Levy (2009:2680) developed a "cross-language assimilation overlap method" which assumes that the percentage of overlap between L1 and L2 sounds in the perception of monolingual speakers of the L1 of a group of learners may be used to determine the perceptual distance between the L2 and the corresponding L1 sounds. In this study (Levy 2009) the results obtained with one group of subjects were used to develop hypotheses for other groups of L2 learners.

The current study focuses on perception and production of English [ɹ] by adult Pakistani learners who speak Saraiki as L1. Saraiki is an Indo-Aryan language spoken in central Pakistan (Shackle 1976) which has a rolled [r] with phonemic aspiration contrast. (See the phonemic inventory of Saraiki in Appendix A) Saraiki [r] has been defined by Varma (1936:80) in the following words:

"[r] is a rolled consonant generally accompanied by two rapid taps of the tongue against the teeth-ridge […]. In the initial position as in [ris (əris)] 'envy', it often tends to begin with a vocalic on-glide and sounds somewhat like [ər]."

Saraiki [r] is produced as a trill in stressed syllable, emotional speech or in some rural dialects. There is a free variation in Saraiki between rolled [r] with two taps and trilled [r] with continuous taps.

#### 2. Hypotheses

In order to develop hypotheses on the expected pattern of learning in light of the predictions of the SLM, we need to calculate perceptual distance between English [ɹ] and the closest L1 and L2 sounds. The distance was calculated on the basis of overlapping in perception of Saraiki monolinguals following the "cross-language assimilation overlap method" (Levy 2009). For the purpose, a perception test was conducted with 10 Saraiki monolinguals. The experiment was based on two discrimination tasks. The first was a 3 alternative forced choice (3AFC) discrimination task. In this task, the participants were asked to listen to three sounds and determine if any two of those were similar. The instructions were given to the monolinguals in the L1. There was one trial for each of the following set of stimuli used in this test. The following nonsense syllables of English sounds spoken by a female native speaker of English (aged 27) were played in the following sequence:

1. [ala], [ana], [aɹa]

2. [aɹa], [awa], [aja]

The purpose of this test was to assess whether the Saraiki monolinguals assimilate English [ɹ] to [l], [w], [n] or [j]. In the discrimination of the [l], [n] and [ɹ] set, out of total 10 participants, 4 participants assimilated [ɹ] with [l] while 6 did not assimilate it with [l]. None of the monolinguals assimilated [ɹ] with [n]. In the set of stimuli which carried [ɹ], [w] and [j], 4 monolinguals discriminated [ɹ] from [w j] accurately. The remaining 6 assimilated [ɹ] with [w]. None of them assimilated [ɹ] with [j]. Thus the 3AFC discrimination test shows that the Saraiki monolinguals perceptually assimilate English [ɹ] with [l] and [w] but not with [j] or [n]. The sounds [w j l n] exist in the phonemic inventories of both Saraiki and English.

The second part of the experiment was an AX discrimination task in which a pair of VCV stimuli was played to the monolinguals who were asked to determine whether these sounds were the same or different. The first member of the set of stimuli was a nonsense syllable [ara] comprising of Saraiki [r] with low vowel [a] on both sides spoken by a female native speaker of Saraiki (aged 39) and the second one was English [aɹa] spoken by a female native speaker of English. Each of the stimuli had three repetitions in this test. The purpose of this test was to see if the Saraiki monolinguals could perceive a difference between English approximant [ɹ] and the L1 rolled [r]. Out of 10 monolinguals, only two discriminated English [ɹ] from the L1 [r] in all three trials consistently and 2 of them discriminated it in one out of three trails. Thus the total percentage of accurate discrimination was 26.7% while 73.3% of the time the monolinguals assimilated English [ɹ] to the L1 [r]. The overall results of the experiment are summarized (in percentage) in Table 1 below.


Table 1 – Perception test results with Saraiki monolinguals (in percentage).

Table 1 shows that Saraiki monolinguals perceptually assimilate English [ɹ] with the L1 [r] 73.3% of the time while 26.7% of the time they discriminate it from the L1 [r]. And the 3AFC test shows that they assimilate English [ɹ] with English [w] and [l] 60% and 40% of the times, respectively. Following the idea of overlap between sounds (Levy 2009) we assume that there may be a maximum of 73.3% overlapping between English [ɹ] and Saraiki [r], 60% overlapping between English [ɹ] and [w] and 40% overlapping between English [ɹ] and [l] in the L2 phonemic inventory of the Saraiki learners of English. On the basis of these results we develop the following hypotheses about expected learning pattern of Pakistani learners of English:


$$\begin{bmatrix} \mathbf{I} \end{bmatrix} \to \begin{bmatrix} \mathbf{w} \end{bmatrix} \to \begin{bmatrix} \mathbf{r} \end{bmatrix}$$

Thus, if Saraiki learners can discriminate between English [ɹ] and Saraiki rolled [r], they will acquire the English [ɹ]. The likelihood of this is a maximum of 26.7% according to the perceptual mapping of the Saraiki speakers of English [ɹ] based on the monolingual test. If a difficulty is experienced, the interfering sounds are likely to be [l w r] with varying levels of interference as determined by the monolingual tests discussed above. To test these hypotheses, we conducted an experiment which is detailed in the following section.

### 3. Research methodology

Perception and production tests were conducted with 90 adult Pakistani learners of English to test the hypotheses developed in section 2. The perception test comprised an AX discrimination task, two 3AFC discrimination tasks and an identification task. The 3AFC tasks and AX discrimination task followed the same procedure as with the monolinguals discussed in section 2. In the identification task, the stimulus [aɹa] spoken by the native speaker of English was played to the participants who were asked to write down in English and Urdu on a given answer sheet what sound they heard between the two vowels. They were further informed to point out if they think that the sound they heard did not match with any of the existing graphemes of Urdu and English. See Appendix for answer sheets.

The production test comprised a word-reading task. The target word was *reach* which the participants read along with some other words. Each of the words was read three times by each of the participants. The other words included in the list of the stimuli were distracters so the participants did not have an idea of the purpose of the test. The readings of the participants were recorded and out of the three repetitions, the best quality recording was provided to four native speakers of English who evaluated these productions on a Likert scale given below:


Table 2 – Scale of marking used by the native speakers.

A cut off point of 4 on the scale is set as indicative of near-native production. Thus any production of the target sound that gets a score of 4 or above will be considered as a correct production of the target sound. A score of 4 (not 5) is considered the cut off point for learning because it is extremely rare for the adult L2 learners to acquire quite native-like production. That is why the SLM also predicts that a new phonetic category for an L2 sound established by an adult learner may be deflected away from that of monolinguals of the L2 (Flege 1995:239).

#### 3.1 Participants of the study

Three groups of learners were selected for this study with the goal of evaluating whether continued exposure improved learners' production and perception of English [ɹ]. In Pakistan English is taught as a compulsory module to students from primary to Bachelor's level and is used as the medium of instruction in many disciplines at post secondary level. All groups involved advanced learners who had been learning English for at least 14 years but they differed with respect to whether they (a) actively used English, (b) specialised in English at MA level, or (c) had exposure from English native speakers. Group (i) consisted of 30 educated adults based in Pakistan who were all graduates from Pakistani universities specialised in non-linguistic/English language courses. This group only uses English for academic purposes or for official correspondence. Thus we call them 'Inactive Learners' of English. Group (ii) consisted of 30 students of MA English studying English language, linguistics and literature in Pakistan. In the following discussion we shall refer to this group as 'Student' group. Group (iii) consisted of learners based in Essex (UK) who left Pakistan after getting their first degree from Pakistan. They will be referred to as UK-based learners in the following discussion.

The participants of all groups originate from the same area; all speak Saraiki as L1 and all studied in similar type of institutions in Pakistan. The purpose of including the UK and Student learners in the study is to assess the role of native-input in the former and that of the active learning in Pakistan in the latter group in acquisition of English [ɹ]. The performance of the Inactive Learners will be used for comparative analysis as all groups of learners were similar up to BA level. Afterwards, the Student group went to MA English courses and the UK group came to England. Thus, the better performance of the Student learners vis-à-vis the Inactive Learners will be ascribed to their active learning of English in Pakistani universities. Similarly, any improvement noted in the UK group vis-à-vis the Inactive Learners will be ascribed to the input that the former are getting in the UK.

#### 3.2 Stimuli

The stimuli were recorded in the voice of a female native speaker of English in a psycholinguistic laboratory of University of Essex. The target consonants were recorded with a low vowel on each side i.e. [aɹa] etc. The stimulus for Saraiki [r] was recorded in the same form i.e. [ara] in the voice of female native speaker of Saraiki. These stimuli were used in the perception test. The methodology used for these tests was the same as discussed in section 2.

#### 4. Presentation of data

In this section the results of the perception and production tests are presented separately. The perception test results are presented first followed by the production test results.

#### 4.1 Perception of []

As mentioned above, the perception test consisted of an identification task, an AX discrimination task and two 3AFC discrimination tasks. Table 3 shows the perception test results in percentage. The results show that in the identification task and in the 3AFC-1 task the UK group performed better than the Student group who in turn performed better than the Inactive Learners group. However, in the 3AFC-2 discrimination task, the performance of all three groups is equally good. In the AX discrimination task, Inactive Learners group performed better than the other two groups in contrast to the trend seen for the identification and 3AFC-2 discrimination task. However, overall performance of all the groups is poor in the AX discrimination test. A non-parametric test confirms the group variance as statistically significant in the identification task (χ2 =17.603, p<.001), the 3AFC-1 discrimination task (χ<sup>2</sup> =13.075, p<.001), and the AX discrimination task (χ<sup>2</sup> =9.068, p<.01). The increasing trend in the performance of the groups is also significant (p<.001). However, group variance in the 3AFC-2 discrimination task is non-significant (p>.1).


Table 3 – Accuracy (in percentage) in perception test.

#### 4.2 Production of [ɹ]

The production test was based on a word-reading task. Four native speakers of English evaluated the productions. The overall reliability in evaluation by the judges was 62% (Cronbach's alpha=.622). The following are the average scores obtained by the participants for the production of English [ɹ] in the word *reach.*  The standard deviations are given in parentheses.


Table 4 – Average scores in the production of [r].

A one-way ANOVA shows significant group variance (F2,87=7.165, p<.001)1 but a post-hoc analysis only confirms variance between UK and Inactive learners (p<.001). The results show that the learners did not perform well in this test. None of the groups could obtain an average score of 4 which was fixed as a minimum cut off point for learning. Although the scores only point out the relative performance of the participants in the production of the target sound (not the actual nature of the consonant produced by the participants), later acoustic analysis shows that the learners produced English [ɹ] as L1 rolled [r]. The results of perception and production test are analyzed and discussed in the following section.

#### 5. Analysis and discussion

The production test results show that the learners have very poor production of English [ɹ] as in the production test none of the groups of learners could obtain an average score of 4 which is the cut off point for considering them as having acquired the target sound. The perception test results show that the performance of all groups including the Inactive learners is excellent in the discrimination of [ɹ] from [l] which indicates that the learners can discriminate English [ɹ] from [l] from early stages of learning. The reason of including [l] ~ [r] contrast in the perception test was to evaluate how well Pakistani learners can discriminate the two sounds since previous research on some L2 learners of English has shown perceptual assimilation of [r] with [l] (e.g. Brown 1998, 2000; Flege et al. 1996; Larson-Hall 2004). In the identification and 3AFC-1 discrimination tests, the UK and Student participants performed better than the Inactive learners. In 3AFC-2, all three groups performed equally well. The 3AFC-1 test was based on discrimination between [ɹ] and [w j] and the 3AFC-2 was based discrimination between English [ɹ] and [l n]. It means both the Student and UK learners have learnt to discriminate [ɹ] from [j w l n] and the Inactive group has learnt to discriminate it from [l n]. However, in the

<sup>1</sup> A Kolmogorov-Smirnov test confirms the normal distribution of the data (p>.05).

AX discrimination test, all participants are poor as they cannot perceive the difference between English [ɹ] and the corresponding L1 [r].

This performance of the learners corresponds with that of the Saraiki monolinguals who also assimilated English [ɹ] with [l], [w] and L1 [r] (see Table 1). However, the results show that the L2 learners are faced with the difficulty to acquire English [ɹ] in the initial stages but some learning must have occurred which reflects the improved performance of the 3 groups of learners. The performance of the 3 groups reveals a particular directionality of learning. The Inactive group who have the least use of English have learnt to perceive the difference between [ɹ] and [l] but are not able to discriminate English [ɹ] from [w] and L1 [r]. The UK and Student groups learnt to differentiate English [ɹ] from [l] and [w] with an accuracy of 80% or above (see Table 3). The two groups could however not discriminate between English [ɹ] and L1 [r] and only have an accuracy rate of <34% for this contrast. These subjects performed well in the identification task and the 3AFC discrimination task because these tasks involved their ability to differentiate English [ɹ] from all the other consonant sounds of English. But the AX discrimination task results show strong equivalence classification between English [ɹ] and L1 [r] in the L2 phonemic inventory of these learners. As a result they produced the approximant English [ɹ] as a rolled [r] as in the L1 (explaining the poor scores they received in the production task). The overall results show a clear learning pattern with respect to the discrimination of English [ɹ] from [w], [l] and L1 [r]. The directionality of difficulty for the learners (from least to most difficult) is as given below:

$$\begin{array}{c} \text{[I]} \rightarrow \text{[w]} \rightarrow \text{L1 [r]} \end{array}$$

Thus Pakistani learners first learn to discriminate English [ɹ] from [l] (as the performance of all participants shows) followed by the discrimination of [ɹ] from [w] based on training and greater input (see the performance of the Student and UK group). The greatest difficulty comes from the discrimination of English [ɹ] from the L1 [r] which even the UK-based group with the input from native speakers cannot overcome. The most advanced Pakistani learners are therefore only able to develop separate representations for English [ɹ] from [w] and [l]. We can depict the emerging learning process in the 3 groups in Figure 1.

Figure 1 – Development of discrimination between L2 [ɹ] and L1 [w].

The above figure shows that in the L2 phonemic inventory of the Inactive learners [ɹ/r] and [w] overlap to a large extent while this is less so in other groups who manage to separate the two sounds and mainly treat them as separate categories. The UK group fairs best in the separation while the Student group can be predicted to show more variable discrimination because of the higher overlap.

The above results are based on collective group performance. If we consider individual performance and use 4 as the near native-like performance cut off point in the production test then there are 3 UK-based participants who have a near native-like performance in the production and perception of English [ɹ]. These 3 participants perceived English [ɹ] accurately in all repetitions of all the perception tasks and also obtained a score of 4 in production task. We can conclude that only 3 UK-based participants developed an independent phonetic category for English [ɹ]. This is illustrated in the following figure which contains two spectrograms of the word *reach* as produced by one of the 3 native-like participants (left spectrogram) and by another participant who is as yet unable to discriminate between English [ɹ] and L1 [r] (right spectrogram).

Figure 2 – Spectrograms of the word 'reach'.

The left-hand spectrogram shows that the participant who was able to discriminate English [ɹ] from the L1 [r] produced the word *reach* with an approximant gesture word-initially but the participant who could not discriminate between English [ɹ] and the L1 [r] produced the English [ɹ] in the word *reach* with a tap or trill as the right hand side spectrogram shows. Besides, on the pattern of the L1 [r], the participant has also added a vocalic gesture in the beginning of the word *reach* virtually producing the word *reach* as [əritʃ]. This demonstrates that most of the learners could not acquire approximant [ɹ] in English; some of them even failed to suppress the epenthesis of initial vocalic gesture in the words of English starting with [r] (a phenomenon transferred from the L1). The epenthetic vowel in the beginning of the word *reach* produced by the participant is clearly reflected in the following waveform highlighted in a rectangular box. This is an example of negative transfer from the L1 as a result of a strong equivalence classification between L2 [ɹ] and L1 [r].

Figure 3 – Waveform of the word *reach* by one of the participants.

#### 6. Conclusion

This paper reported on an experiment that whether Pakistani learners of English will acquire English [ɹ] accurately or assimilate it with [w], [l], or the L1 [r]. The results show that although there has been some progress in the acquisition of English [ɹ], the learners have not accurately acquired English [ɹ] even though there are individual participants who show that such acquisition is possible. On the basis of the results from the 3 groups we are able to map out a clear developmental path in the discrimination of English [ɹ] from the closest sounds namely [l], [w] and L1 [r]. The group with the least exposure to English post classroom learning (Inactive Learners) show the least acquisition and are only able to discriminate English [r] from [l]. The intermediate group in terms of exposure (Student learners in Pakistan), who because they have specialised in English at MA level have a higher English usage than the first group, are better at the discrimination of English [ɹ] from [l] and can also discriminate it from [w]. The most advanced group in terms of more systematic day to day exposure to English in the UK have overall better results even though they still fall short of the accurate acquisition of English [ɹ].

The overall developmental path attested is parallel to the performance of the Saraiki monolinguals who showed a variation in the discrimination of English [ɹ] from the closest sounds, with accuracy gradually declining from [l] (60%) to [w] (40%) to [r] (26.7%). This verifies the idea of the SLM that "the greater the perceived phonetic dissimilarity between an L2 sound and the closest L1 sound, the more likely it is that phonetic differences between the sounds will be discerned". The SLM is further supported by an individual analysis of the results which shows that the three participants of the UK-based group who could perceive a difference between English [ɹ] from the closest sounds including L1 [r] are also able to produce English [r] accurately.

There are two outstanding issues. The first one is regarding why the Inactive Learners group with the least English exposure performed better than the other two groups in the AX discrimination task (see Table 3). This may be better considered not within the framework of second language acquisition, but within a sociolinguistic one. It might well be that people employing English for professional purposes are more aware of the difference between their own and the native pronunciation, without being able to reproduce it. In this respect 'Inactive learners' are really inactive in their production, i.e. fossilized with respect to the other two groups currently exposed to different kinds of input, that they cannot produce the L2 sound different from the closest L1 sound although most of them perceive the difference between the L1 and L2 consonant.

The second issue is that of the insertion of an epenthetic vowel in the beginning of the words starting with [r] in Saraiki and its implications in the acquisition of L2. In this regard my point of view is that at some stage of its historical development Indo-Aryan languages did not accept word initial consonants (Masica 1993). At that stage all words started with vowels. Later on, it started accepting consonants word-initially but as a remnant of the old traditions the speakers added some vocalic gesture or schwa like insertion in the beginning of the words starting with consonants. Epenthesis of vowel before sonorants and strong pre-voicing in obstruents in Indo-Aryan languages like Saraiki is a remnant of that period of the language history. However, both these issues need further investigations and are left for future research.

#### Acknowledgements

My special thanks are due to Nancy C. Kula (University of Essex) who guided me thoroughly in this study. Also, I am thankful to the anonymous reviewer for valuable comments.

#### References



#### Appendix A: Phonemic inventory of Saraiki2

<sup>2</sup> Shackle (1976:18) does not include the breathy voiced alveo-palatal nasal in the consonantal inventory of Saraiki but the sound does exist in the language. Examples are words like, /kaɲ<sup>h</sup> ã,/ 'late' and /mãɲ<sup>h</sup> ar/ 'castrated'.

#### Appendix B: Answer sheets

#### **1: Answer sheet for the identification test**

**Instructions for the participants:** You will listen some consonants of English each flanked by a long vowel [a] on both sides. After listening the consonants, just note in the blank space provided in the sheet the consonant you have heard between two a's. Also note the same consonant in Urdu in the next column. If the sound does not exist in either of the languages, please point out in column three of the sheet.


#### **2: Answer sheet for 3AFC discrimination test**

**Instructions for the participants:** First the target sound will be played. After a pause a pair of sounds will be played. If the first sound of the pair matches the target, tick in column A of the answer sheet, if the second one matches the target sound tick in column B and if neither of the sounds matches with the target sound, cross (×) in column C.


#### **3: Answer sheet for the AX discrimination test**

**Instructions for the participants:** Please listen to the pairs of sounds and determine if the consonants in the sounds are identical or different by ticking in the relevant column. Please ignore the difference in tone, pitch and intonation of the speakers and decide only on the basis of the consonant between two vowels.


## On rhotics in a bilingual community: A preliminary UTI research

#### Lorenzo Spreafico & Alessandro Vietti, Language Study Unit, Free University of Bozen-Bolzano

#### Abstract

In this paper we offer an Ultrasound Tongue Imaging (UTI) based description of rhotics in bilingual speakers from South-Tyrol. In particular we examine whether adult Italian/Tyrolean bilinguals display differentiated patterns of articulation for rhotics in each language they speak and whether bilinguals' articulatory patterns in each examined language are similar to those used by almost monolingual speakers or not. Intraspeaker comparison shows that very late sequential bilinguals do not present distinct articulatory patterns for rhotics in the two languages, while the simultaneous bilingual do. Besides interspeaker comparison shows that articulatory patterns for rhotics used by simultaneous monolinguals differ from those used by the very late sequential bilingual speakers. This data helps to understand how phonological categories are organized by bilinguals, and tackles the long debated issue regarding the possibility that bilinguals make use of a single shared phonological system or of two separate ones.

#### 1. Introduction

#### 1.1 Background

This study1 is part of a project aimed at collecting a socially-stratified articulatory corpus using the UTI technique. The participants included in the database are bilingual speakers of Italian and of Tyrolean as they are spoken in South Tyrol. From a sociolinguistics point of view, South Tyrol is characterized by a societal bilingualism with two quite separate linguistic communities: Tyrolean and Italian. These two communities exhibit marked asymmetries in their linguistic repertoires (Table 1). The linguistic repertoire of the members of the Tyrolean community is characterized by a medial diglossia, with Tyrolean – a southern Bavarian dialect (Wiesinger 1989; Barker 2005) – in lower position, and Standard German in high position (Ciccolone 2010; Lanthaler 1990). Moreover the repertoire of the German community very often includes Italian, especially if speakers with middle-high level of education and living in main towns such as the capital city Bozen-Bolzano are considered.

In contrast, the members of the Italian community are not markedly bilingual with respect to Tyrolean, although they are likely to display discrete competence in an Italo-Romance dialect, – especially if they are of an older generation – or in Standard German, especially when they belong to the younger community and learnt it in school.


Table 1 – Linguistic repertories in South Tyrol.

#### 1.2 Rhotics in South Tyrol

What are the consequences of this situation on the phonetics and phonology of Italian and Tyrolean as they are spoken in South Tyrol? Unfortunately research on this topic is scant and actually limited to one volume (Tonelli 2002). Even scanter however are investigations offering data on rhotics. As for Italian spoken in the area we can refer to auditory investigations by Mioni (1990, 2001), Canepari (1990), Tonelli (2002) and to instrumental investigation by Vietti, Spreafico & Romano (2010), Spreafico & Vietti (2010), Vietti & Spreafico (2010) and Spreafico & Vietti (2011). As for Tyrolean, interesting exceptions are Klein & Schmitt (1969) and again Tonelli (2002).

Mioni's (1990) investigation limits itself to the utterances in Italian produced by informants living in the cities, and in particular it focuses on monolingual and bilingual students. As regards rhotics in Italian monolinguals, he affirms that the apicoalveolar tap usually prevails. As for the bilinguals, the author reports that all his informants (with no significant distinctions) use some sort of uvular rhotic, which, as far as he is concerned, reveals an influence of the Bavarian dialect substratum and, in a way, indexes speakers' ethnicity1 . In contrast, on the basis of auditory analyses (of supposedly monolinguals' utterances only) Canepari (1990) reports on the tendency of using uvular pronunciation (e.g. [ʀ; ʁ̞]), which at times can even be accompanied by alveo-uvular pronunciations. Yet Tonelli

<sup>1</sup> This becomes even more evident if one takes into account that, as reported by Mioni (2001), the Italian phonology in these informants is properly acquired and it is substantially the same as the one used by the Italian native speakers around them.

(2002) shows that the only variant of the /r/ sound to be found in an Italian sample (again comprising monolinguals only) living in Bolzano is [ɾ], which is sometimes, and in marked pronunciation only, replaced by [r].

Vietti & Spreafico (2010) offered a different picture of this phenomenon. They acoustically analyzed the type of /r/ realizations in Italian productions by South Tyrolean informants and pointed out that sometimes both apical and uvular realizations can be detected in utterances and even in isolated words produced by the same informant. They examine a sample of 11 speakers and about 500 occurrences and show that their informants make use of many more allophones than those documented in previous research: [ɾ]2 ; [ρ]3 ; [ʁ̞̞]; [ʀ]; [χ]; [r]; [ɽ]; [ʐ]; [ɻ]; [ʁ]. In addition, they identify several instances of deletion, as well as other phones that could be hardly categorized mostly due to the fact that the acoustic and auditory data were contradictory.

Systematic research on rhotics in South Tyrolean is sparse and limited to the information provided by the *Tirolischer Sprachatlas* (Klein & Schmitt 1969). As for the analysis of the data including /r/ realizations in Klein & Schmitt (1969), it is worth noting that an extremely relevant diatopic variation emerges and that salient differences emerge across the broader area of South Tyrol4 . For example the analysis of some of the maps in the volume on *Konsonantismus, Vokalquantität, Formenlehre* for the capital city of Bozen-Bolzano shows that uvular articulations are registered in six out of nine cases<sup>5</sup> , while apicoalveolar articulations are reported for the rest. The alternation among front and back realizations seems also to affect the so-called *Bozner Deutsch*, which, according to Tonelli (2002) is characterized by [ʀ] and exceptionally by [r]. These observations are consistent with those reported in studies on bordering areas as in the case of Ulbrich & Ulbrich (2007) who remarks on Austrian German: they note that the spectroacoustic analysis of newsreaders' productions reveals a prevailing use of uvular realizations in onset position (especially [ʀ] and [ρ], but also [χ] and [ʁ], which may be due to backing phenomena) and mainly vocalized variants of /r/ in coda position, although not excluding apical articulation.

<sup>2</sup> Both tap and – to a lesser extent – flap [ɾ].

<sup>3</sup> Uvular tap. This sound, unknown in the IPA, is transcribed by the symbol [ρ] according to a proposition made by Demolin et al. (ms).

<sup>4</sup> E.g. deletions and apical realizations in the Western Pustertal *versus* uvular trills in the Easter Pustertal.

<sup>5</sup> Uvular articulations are reported for: *Durst*, map 50; *Wurst*, map 51; *Werden*, map 58; *Hertz*, *Fertig*, *Wird*, map 54. Apicolaveolar articulations are registered for: *Feuer*, *Bauer*, *Bauertag*, map 91. It is worth noticing here that there seems to be an isogloss running NE-SW along the Eisack Valley separating /ʀ/ dialects in the West from /r/ dialects in the East.

The brief discussion offered above clearly shows the lack of systematic investigation of both Italian and Tyrolean dialect with respect to rhotics. Therefore, this research also contributes to fill the gap as it offers a preliminary instrumental description.

#### 2. Methods

#### 2.1 Informants

In order to answer the research questions on whether adult Italian/Tyrolean bilinguals display differentiated patterns of articulation for rhotics and on whether pattern of articulation in adult bilinguals are similar to those by monolingual speakers we collected a socially-stratified articulatory corpus using the UTI technique (Stone 2005; Iskarous 2005; Davidson 2012).

The nineteen informants included in the database are bilingual speakers of Italian and of Tyrolean as spoken in South Tyrol. They are all in their mid 30's and were born and raised in Bozen-Bolzano, the capital city of South Tyrol. Initially a questionnaire was used to determine the participants' length and amount of exposure to the two languages. Building on that each informant was assigned to one of four groups on a bilingualism discretum scale: simultaneous bilinguals, early sequential bilinguals, late sequential bilinguals and very late sequential bilinguals.

This was mostly on the basis of two parameters: the rate of bilingualism in the family, that is whether the informant's parents were native speakers of the same language or not, and the rate of dual language exposure, in other words whether the informant had been in contact with Italian and the Tyrolean dialect from birth, from nursery school on, from primary school on or from secondary school on only (as shown by Simonet 2010 for Catalan)6 .

In order to control for the real exposure to the two languages and to obtain a better understanding of the sociolinguistic *milieu* and hence of the sociophonetic environment each informant was inserted into (Khattab 2002), we collected social network data for each speaker using an egocentric approach which examines individuals' immediate neighbors and associated interconnections (Milroy & Milroy 1985; Scott 2000). This allowed us to assess the amount of Italian or Tyrolean each speaker was exposed to and actually resorted to in his/her daily life.

<sup>6</sup> It is important to notice here that South Tyrol has a split school system with segregated Italian and German schools and that in the latter case lessons are supposed to be taught in Standard German and not in Tyrolean. This means that in South Tyrol, Tyrolean can be acquired via spontaneous interactions only, whereas Italian can also be learnt via formal instruction.

By linking the data from the questionnaire and those from the egocentric social network we were able to include into the corpus 8 simultaneous bilinguals, 3 early sequential bilinguals, 4 late sequential bilinguals and 4 very late sequential bilinguals.

In this paper we only focus on the analysis of rhotics as they are articulated by seven speakers out of the nineteen we recorded, namely those belonging to the opposite poles of the discretum (see Table 2): two very late sequential bilinguals (LSB) and five simultaneous bilinguals (SB). Each of the very late sequential bilinguals grew up in strictly monolingual families: an Italian (LSB1, female) and a Tyrolean (LSB2, female) respectively, and according to data from their social network at the time of our recording, had almost no contacts with members of the other language community.

On the other hand the simultaneous bilingual speakers SB1 (male), SB2 (male), SB3 (male), SB4 (female), SB5 (female) came from bilingual families (in the sense that each of their parents was a native speakers of one of the two languages), attended both Italian and German schools, and, according to their egocentric network, kept up relationships equally with members of the two language communities.


Table 2 – Speakers' rate of interaction (%) in each language or combination of languages for their last 10 encounters during the day of data collection. Information retrieved via the EgoNet software (McCarthy 2011). \*Not all logical combinations reported, total might differ from 100%.

#### 2.2 Procedure

For data collection, we used the Articulate Instruments multichannel acquisition system called Articulate Assistant Advanced (AAA) (Articulate Instruments 2011).

Articulatory data was recorded using a portable SonoSite 180 ultrasound machine equipped with a SonoSite ICT intracavitary array transducer operating at 4-7 MHz. The frame rate was automatically and unchangeably set at 15 Hz; the depth was autonomously set at 7 cm; the field of view was 120°. The probe was held by a stabilizing helmet to make sure that it adhered to the speaker's chin and was kept in constant relationship to the speaker's palate.

Acoustic data was recorded at 22,050 Hz using a Marantz PMD660 recorder coupled with a Beyerdynamic MCE86N microphone. The audio signal exiting from the recorder was synchronized to the video signal coming from the ultrasound machine via the SyncSyncBrightUp™ (Articulate Instruments 2011). This device was triggered by an audio beep generated by AAA upon pressing the start recording button. The software then superimposed a white mark on the video signal and generated a sync pulse used to synchronise the audio and video signal during the analysis.

Overall 38 written prompts were presented to each informant via a PC monitor. At first two test words were presented to the speakers to acquaint them with the procedure. Then two word-lists were presented to the participants, one in Italian and one in Tyrolean7 . Each list contained 18 randomly arranged target words beginning with a CRV sequence of the kind: plosive plus rhotic plus high or low vowel (see Appendix 1). These sequences were chosen to control the high contextual variability of /r/ already observed in Vietti & Spreafico (2008) so to allow a better comparison of static articulations in the two languages; as well as to allow an analysis of coarticulation phenomena in onset clusters8 .

In addition to the target words, each list contained two distractors used to urge informants into swallowing some water or eating some pudding. That was needed to collect palate images of a decent quality that could serve as reference for the subsequent analysis. Each sequence of written prompts was submitted in the same order to the informants three times, so in the end we were able to record 114 words for each speaker. That was needed to ensure that notwithstanding the slow and unalterable scan rate of 15 Hz at least one image of the tongue during the short constriction phase could cleanly be imaged for each of the eighteen CRV sequence in the two languages.

<sup>7</sup> Since there are no common writing conventions for Tyrolean, which are inherited and customized from Standard German, informants were allowed to examine a printed copy of the word list before the test to be sure they would recognize all forms it contained.

<sup>8</sup> We leave this matter for future research.

All speakers were individually recorded in a soundproof room, and whenever possible two researchers at a time attended the data collection session and interacted with the informants. This was arranged to assure that both a native speaker of Italian and a native speaker of Tyrolean were present at the same time so to ensure a truly bilingual environment and have the informant in the bilingual mode (Grosjean 1998).

For data analysis, we ran a parallel auditory9 /articulatory analysis based on the audio records and on the synchronized mid-sagittal ultrasound images of the tongue. The /r/ tokens were coded for one of seven categories: four dorsals (trill, tap, fricative, approximant); two coronals (trill, tap); and deletion.

Then we semi-automatically fitted mid-sagittal tongue surface using AAA (version 2.13) that also allowed for manual correction of the splines. If we could draw more than one spline traceable back to the same rhotic, we exported only the one corresponding to the closure phase for trills and taps or to the medial one for fricatives and approximants. At last we transferred the curves drawn onto the raw ultrasound image in Cartesian coordinates to a spread sheet as the basis of a qualitative analysis.

#### 3. Data

#### 3.1 Data analysis

Of the 756 rated tokens, only 585 were included in the analysis (Table 3). Problems in tongue imaging common to most UTI research10, such as discontinuities in the surface contour due to asynchronies between the scan rate and the frame rate as well as to shadows casted by the hyoid bone, the jaw, or ultrasound refraction forced us to discard many tokens. This especially held for SB1, for whom we were only able to extract 22 out of 54 profiles in Tyrolean and 47 in Italian11.

<sup>9</sup> Even if an auditory classification was undertaken, spectrograms were also was used to support the classification.

<sup>10</sup> Relevant UTI works on rhotics include, among the many others, Iskarous et al. (2010); Lawson et al. (2008); Proctor (2009); Scobbie & Sebregts (2011).

<sup>11</sup> Apparently in Tyrolean the tongue assumed a position that differed from that displayed during the instrumentation set up based on inter-utterance rest positions and henceforth caused the tongue to parallel the beam orientation, thus refracting the ultrasounds. In Italian the phenomenon was rarer, which raises the more general issue of language-specific articulatory settings (Gick et al. 2004).


Table 3 – Analyzed tokens per speaker; percentage of coronal rhotics and major allophone in each language.

#### 3.2 Auditory analysis

Table 3 above contains data on the auditory analysis we ran and reports on the number of tokens, the percentage of coronal rhotics and the most frequent allophone for each speaker in the two languages.

It was evident from our analysis that all speakers but LSB1 resorted to a uvular consonant (mostly [χ]) to read the Tyrolean words. As far as Italian words were concerned, however, both uvular and apical rhotics were attested, since SB4 and SB5 switched between the two places of articulation according to the language the prompts belonged to.

It also emerged from the auditory analysis that none of the speakers we considered alternated between coronal and dorsal variants within the same language, and that in Tyrolean no other allophone beside [χ, ʁ̞, ʁ] was used, while in Italian also [r] occurred.

#### 3.3 Articulatory analysis

#### 3.3.1 Intraspeaker comparison

In order to assess if adult bilinguals display one or two patterns of articulation for rhotics in Italian and Tyrolean respectively, we considered at first the static articulations of the two very late sequential bilinguals LSB1 and LSB2, namely an almost monolingual speaker of Italian and an almost monolingual speaker of Tyrolean, and ran an intraspeaker comparison of their tongue profiles. Our analysis was based on impressionistic observations on the shape and position of the tongue, as well as on the statistic comparison of tongue splines.

The impressionistic, graphic analysis of LSB1's data reported in Fig. 1 shows that in each of the nine CRV sequences we considered ([k, g, t, d, p, b | r | u, a, i])

there is no strong categorical distinction between tongue shape and position in the two languages and that the two splines almost always coincide. *3.3 Articulatory analysis 3.3.1 Intraspeaker comparison*

It also emerged from the auditory analysis that none of the speakers we considered alternated between coronal and dorsal variants within the same language, and that in Tyrolean

In order to assess if adult bilinguals display one or two patterns of articulation for rhotics in

no other allophone beside [\_, \_\_, \_] was used, while in Italian also [r] occurred.

Figure 1 – Tongue shapes for r-sounds in LSB1. See Fig. 2 for the explanation of colors.

Figure 1 – Tongue shapes for r-sounds in LSB1. See Fig. 2 for the explanation of colors.

Figure 2a – LSB1, Mean tongue shapes for r-sounds Figure 2b – LSB1, radar charterization of the t-test

The green line at the top of the image always represents the palate, whereas the blue and the red Figure 2a – LSB1, mean tongue shapes Figure 2b – LSB1, radar chart of the t-test. for *r*-sounds.

groups of spokes can stand for the places of

40 line at the bottom represent shape and position assumed by the tongue in Italian and in the Tyrolean dialect respectively; tongue tip and blade are right, tongue root is left. articulation in reference to the upper surface of the vocal tract. In a clockwise direction approximately they are: spokes 7 to 13 alveolar ridge; 14-20 hard palate; 21-25 soft palate (velum); 26-30 uvula; 31-35 pharynx. The green line at the top of the image always represents the palate, whereas the blue and the red line at the bottom represent shape and position assumed by the tongue in Italian and in the Tyrolean dialect respectively; tongue tip and blade are right, tongue root is left. As a mere means of orientation in the radar chart, groups of spokes can stand for the places of articulation in reference to the upper surface of the vocal tract. In a clockwise direction approximately they are: spokes 7 to 13 alveolar ridge; 14-20 hard palate; 21-25 soft palate (velum); 26-30 uvula; 31-35 pharynx.

Fig. 2a depicts the averaged spline calculated from the subset of splines associated with a rhotic sound in each of the two languages and shows that the main body of the tongue is held convex to the palate, with the antero-dorsum straight and steep raising and the tip down, pointing to the alveolar ridge on the roof of the mouth, thus defining a constriction in the post-alveolar area and producing almost always an alveolar tap in both languages as attested by the auditory analysis. dialect respectively; tongue tip and blade are right, tongue root is left. they are: spokes 7 to 13 alveolar ridge; 14-20 hard palate; 21-25 soft palate (velum); 26-30 uvula; 31-35 pharynx. Fig. 2a depicts the averaged spline calculated from the subset of splines associated with a rhotic sound in each of the two languages and shows that the main body of the tongue is held convex to the palate, with the antero-dorsum straight and steep raising and the tip down, pointing to the alveolar ridge on the roof of the mouth, thus defining a constriction in the post-

The initial impression of similarity between the two tongue profiles is confirmed by the statistical analysis, which is based on the calculation of a t-test12 for each spoke between the two splines via the AAA integrated tool and is rendered here in a radar chart where the higher is the distance among the two lines, the higher is the difference among the two splines (Fig. 2b). alveolar area and producing almost always an alveolar tap in both languages as attested by the auditory analysis. The initial impression of similarity between the two tongue profiles is confirmed by the statistical analysis, which is based on the calculation of a t-test<sup>25</sup> for each spoke between the two splines via the AAA integrated tool and is rendered here in a radar chart where the higher is the distance among the two lines, the higher is the difference among the two splines (Fig. 2b).

The analysis of LSB2's data offers a different image for the tongue shape and position, but a

back due to an higher degree of root retraction in Tyrolean.

Figure 3a – LSB2, mean tongue shapes. Figure 3b – LSB2, radar chart of the t-test.

very similar one for the almost coincidence of the profiles in Tyrolean and in Italian. Extracted mean tongue surfaces (Fig. 3a) show a near semi-circular shape especially for Italian, with a retracted root, the dorsum held convex to the palate and the lamina pointing down. The tongue bunching up towards the postvelar zone and the absence of an alveolar constriction point to a dorsal articulation, which fits in with the acoustic analysis that shows a predominance of voiced or voiceless uvular fricatives. The statistical analysis (Fig. 3b) of the difference between the two splines shows that these thicken in the laminal and in the posterodorsal area, apparently because of a slight backwards shifting of the tongue which is still to be seen notwithstanding the poor quality of the images in the hindermost region of the tongue. The intraspeaker comparison of SB1 shows again almost an overlapping of the two contours (Fig. 4a) that display a near semicircular shape similar to that reported for LSB2: the tongue is mid bunched and the lamina is kept low while the middle of the tongue is raised towards the hard palate. This configuration allows the identification of a dorsal articulation, notwithstanding the limit in the size of the depicted palate that makes it difficult to precisely assess the place of articulation. Nevertheless the auditory analysis of this speaker`s production The analysis of LSB2's data offers a different image for the tongue shape and position, but a very similar one for the almost coincidence of the profiles in Tyrolean and in Italian. Extracted mean tongue surfaces (Fig. 3a) show a near semi-circular shape especially for Italian, with a retracted root, the dorsum held convex to the palate and the lamina pointing down. The tongue bunching up towards the postvelar zone and the absence of an alveolar constriction point to a dorsal articulation, which fits in with the acoustic analysis that shows a predominance of voiced or voiceless uvular fricatives. The statistical analysis (Fig. 3b) of the difference between the two splines shows that these thicken in the laminal and in the posterodorsal area, apparently because of a slight backwards shifting of the tongue which is still to be seen notwithstanding the poor quality of the images in the hindermost region of the tongue.

by the two evaluators converges on an auditorily identical [\_\_] as the most recurrent variant, which is further confirmed by the spectrographic analysis. As for the similarity between the two profiles, the t-test (Fig. 4b) shows that the difference among the two splines almost equals zero, except for two points in the foremost part of the imaged tongue26 and for a point in the The intraspeaker comparison of SB1 shows again almost an overlapping of the two contours (Fig. 4a) that display a near semicircular shape similar to that reported for LSB2: the tongue is mid bunched and the lamina is kept low while the middle of the

41

26 Even if statistically significant data is not revealing given the poor definition of the tongue profile at the considered point.

<sup>25</sup> 2-tailed t-test, unequal variances and sample sizes, Welch-Satterwaite equation as performed by AAA. t-test was

significant at 5%.

<sup>12</sup> 2-tailed t-test, unequal variances and sample sizes, Welch-Satterwaite equation as performed by AAA. t-test was significant at 5%.

tongue is raised towards the hard palate. This configuration allows the identification of a dorsal articulation, notwithstanding the limit in the size of the depicted palate that makes it difficult to precisely assess the place of articulation. Nevertheless the auditory analysis of this speaker's production by the two evaluators converges on an auditorily identical [ʁ] as the most recurrent variant, which is further confirmed ̞ by the spectrographic analysis. As for the similarity between the two profiles, the t-test (Fig. 4b) shows that the difference among the two splines almost equals zero, except for two points in the foremost part of the imaged tongue13 and for a point in the back due to an higher degree of root retraction in Tyrolean.

Figure 4a – SB1, Mean tongue shapes Figure 4b – SB1, radar charterization of the t-test

For speaker SB2 (Fig. 5a) the tongue is held convex to the palate with the anterodorsum raising up and the tongue tip down pointing to the alveolar ridge, thus defining a dorsal

articulation. The impressionistic and the statistical (fig. 5b) analysis on the difference between the two splines show that even if the two profiles are broadly comparable in shape, in Tyrolean the tongue tends to be lower than in Italian, especially in the dorsum. However, the radar chart also depicts how statistically significant differences emerge in the antero-dorsum rather than in the root. Palate Italian Zero line Difference line For speaker SB2 (Fig. 5a) the tongue is held convex to the palate with the anterodorsum raising up and the tongue tip down pointing to the alveolar ridge, thus defining a dorsal articulation. The impressionistic and the statistical (Fig. 5b) analysis on the difference between the two splines show that even if the two profiles are broadly comparable in shape, in Tyrolean the tongue tends to be lower than in Italian, especially in the dorsum. However, the radar chart also depicts how statistically significant differences emerge in the antero-dorsum rather than in the root. Figure 4a – SB1, Mean tongue shapes Figure 4b – SB1, radar charterization of the t-test For speaker SB2 (Fig. 5a) the tongue is held convex to the palate with the anterodorsum raising up and the tongue tip down pointing to the alveolar ridge, thus defining a dorsal articulation. The impressionistic and the statistical (fig. 5b) analysis on the difference between the two splines show that even if the two profiles are broadly comparable in shape, in Tyrolean the tongue tends to be lower than in Italian, especially in the dorsum. However, the radar chart also depicts how statistically significant differences emerge in the antero-dorsum rather than in the root.

tongue is kept lower but, for the foremost portion, which is higher. Nevertheless the radar chart associated with the t-test (Fig. 6b) illustrates that the difference between the two splines Figure 5a – SB2, mean tongue shapes. Figure 5b – SB2, radar chart of the t-test.

Italian

Italian

involve the position of the tongue, rather than its shape.

Palate Tyrolean

Palate Tyrolean

involve the position of the tongue, rather than its shape. contours similar to those by LSB2 with a clear mid bunching of the tongue: the front, blade and tip are low, while the middle of the tongue is raised towards the palate to articulate even spectrographically similar uvular approximants. The Cartesian space shows that in Italian the <sup>13</sup> Even if statistically significant data is not revealing given the poor definition of the tongue profile at the considered point.

is significant but for the anterodorsal and the radical portion. The differentiation thus seems to

tongue is kept lower but, for the foremost portion, which is higher. Nevertheless the radar chart associated with the t-test (Fig. 6b) illustrates that the difference between the two splines is significant but for the anterodorsal and the radical portion. The differentiation thus seems to

The intraspeaker comparison of SB3 profiles (Fig. 6a) shows again two broadly comparable

42

For speaker SB4, Figure 7a displays that both in Tyrolean and in Italian the tongue is kept smoothly convex to the palate, with no bunching or tip raising. Even if similar in shape, the intra-speaker comparison of tongue profiles via the t-test reports a significant differentiation

42

For speaker SB4, Figure 7a displays that both in Tyrolean and in Italian the tongue is kept smoothly convex to the palate, with no bunching or tip raising. Even if similar in shape, the intra-speaker comparison of tongue profiles via the t-test reports a significant differentiation

Figure 6a – SB3, Mean tongue shapes Figure 6b – SB3, radar charterization of the t-test

Figure 6a – SB3, Mean tongue shapes Figure 6b – SB3, radar charterization of the t-test

Zero line Difference line

Zero line Difference line

Difference line

rather than in the root.

Palate

Tyrolean

Palate Tyrolean

The intraspeaker comparison of SB3 profiles (Fig. 6a) shows again two broadly comparable contours similar to those by LSB2 with a clear mid bunching of the tongue: the front, blade and tip are low, while the middle of the tongue is raised towards the palate to articulate even spectrographically similar uvular approximants. The Cartesian space shows that in Italian the tongue is kept lower but, for the foremost portion, which is higher. Nevertheless the radar chart associated with the t-test (Fig. 6b) illustrates that the difference between the two splines is significant but for the anterodorsal and the radical portion. The differentiation thus seems to involve the position of the tongue, rather than its shape. Figure 5a – SB2, Mean tongue shapes Figure 5b – SB2, radar charterization of the t-test The intraspeaker comparison of SB3 profiles (Fig. 6a) shows again two broadly comparable contours similar to those by LSB2 with a clear mid bunching of the tongue: the front, blade and tip are low, while the middle of the tongue is raised towards the palate to articulate even spectrographically similar uvular approximants. The Cartesian space shows that in Italian the tongue is kept lower but, for the foremost portion, which is higher. Nevertheless the radar chart associated with the t-test (Fig. 6b) illustrates that the difference between the two splines is significant but for the anterodorsal and the radical portion. The differentiation thus seems to involve the position of the tongue, rather than its shape.

Figure 4a – SB1, Mean tongue shapes Figure 4b – SB1, radar charterization of the t-test For speaker SB2 (Fig. 5a) the tongue is held convex to the palate with the anterodorsum raising up and the tongue tip down pointing to the alveolar ridge, thus defining a dorsal articulation. The impressionistic and the statistical (fig. 5b) analysis on the difference between the two splines show that even if the two profiles are broadly comparable in shape, in Tyrolean the tongue tends to be lower than in Italian, especially in the dorsum. However, the radar chart also depicts how statistically significant differences emerge in the antero-dorsum

Italian Zero line

Difference line

Difference line

For speaker SB4, Figure 7a displays that both in Tyrolean and in Italian the tongue is kept Figure 6a – SB3, mean tongue shapes. Figure 6b – SB3, radar chart of the t-test.

42 intra-speaker comparison of tongue profiles via the t-test reports a significant differentiation For speaker SB4, Figure 7a displays that both in Tyrolean and in Italian the tongue is kept smoothly convex to the palate, with no bunching or tip raising. Even if similar in shape, the intra-speaker comparison of tongue profiles via the t-test reports a significant differentiation which affects almost each point and, again, is due to the different position the tongue takes, lowered and retracted in Tyrolean, with respect to the palate. Surprisingly both the auditory and the spectrographic analysis gives different outcomes for the two languages and dorsouvulars prevails in Tyrolean, while alveo-coronals are predominant in Italian. which affects almost each point and, again, is due to the different position the tongue takes, lowered and retracted in Tyrolean, with respect to the palate. Surprisingly both the auditory and the spectrographic analysis gives different outcomes for the two languages and dorsouvulars prevails in Tyrolean, while alveo-coronals are predominant in Italian.

smoothly convex to the palate, with no bunching or tip raising. Even if similar in shape, the

the two languages in different ways: in Tyrolean her tongue forms a smooth convex curve with no distinct bunching; the root is slightly retracted, the body leaned towards the back of the mouth and the tip is far from determine a point of primary constriction next to the alveolar ridge. On the contrary, when articulating a rhotic in Italian, the body of the tongue is more advanced and presents a mid-bunching; the middle is more raised towards the hard palate while the blade and the tip are kept high, at least higher than in Tyrolean. Besides a saddle is to be spotted, which probably coincides with the place where the dorsum and the lamina

The visual impression of a difference among the two mean splines for the two languages is further confirmed by the statistic and auditory analysis: as reported in the radar chart (Fig. 8b), there are significant differences both in the posterodorsal/radical region and in the laminal area; and as derived from the auditory analysis the speaker goes for apical rhotics

43

On the other hand, simultaneous bilinguals tend to differentiate among articulation patterns in the two languages, even if with varying degrees: indeed as reported in Table 4 while in the case of SB1 the two splines significantly differ in only two points, for the rest of informants

Figure 8a – SB5, Mean tongue shapes Figure 8b – SB5, radar charterization of the t-test Data presented so far allow us to answer the first question on whether adult bilinguals display different patterns of articulation for rhotics in the two languages they speak and to affirm that apparently no space for differentiation is left for very late bilinguals. In fact they tend to almost completely transfer the shape and position of articulation from one language to the other and to articulate rhotics in the second language they learnt as if that were instances

Figure 7a – SB4, mean tongue shapes. Figure 7b – SB4, radar chart of the

([\_]) in Italian and for uvular rhotics ([\_]) in Tyrolean.

Italian Tyrolean

of the first language they learnt.

test Examination of tongue curves for speaker SB5 shows (Fig. 8a) that she articulates rhotics in t-est.

Zero line Difference line

diverge.

Palate

Zero line Difference line

Examination of tongue curves for speaker SB5 shows (Fig. 8a) that she articulates rhotics in the two languages in different ways: in Tyrolean her tongue forms a smooth convex curve with no distinct bunching; the root is slightly retracted, the body leaned towards the back of the mouth and the tip is far from determine a point of primary constriction next to the alveolar ridge. On the contrary, when articulating a rhotic in Italian, the body of the tongue is more advanced and presents a mid-bunching; the middle is more raised towards the hard palate while the blade and the tip are kept high, at least higher than in Tyrolean. Besides a saddle is to be spotted, which probably coincides with the place where the dorsum and the lamina diverge. Figure 7a – SB4, Mean tongue shapes Figure 7b – SB4, radar charterization of the ttest Examination of tongue curves for speaker SB5 shows (Fig. 8a) that she articulates rhotics in the two languages in different ways: in Tyrolean her tongue forms a smooth convex curve with no distinct bunching; the root is slightly retracted, the body leaned towards the back of the mouth and the tip is far from determine a point of primary constriction next to the alveolar ridge. On the contrary, when articulating a rhotic in Italian, the body of the tongue is more advanced and presents a mid-bunching; the middle is more raised towards the hard palate

which affects almost each point and, again, is due to the different position the tongue takes, lowered and retracted in Tyrolean, with respect to the palate. Surprisingly both the auditory and the spectrographic analysis gives different outcomes for the two languages and dorso-

uvulars prevails in Tyrolean, while alveo-coronals are predominant in Italian.

Palate Tyrolean

Italian

The visual impression of a difference among the two mean splines for the two languages is further confirmed by the statistic and auditory analysis: as reported in the radar chart (Fig. 8b), there are significant differences both in the posterodorsal/radical region and in the laminal area; and as derived from the auditory analysis the speaker goes for apical rhotics ([ɾ]) in Italian and for uvular rhotics ([χ]) in Tyrolean. while the blade and the tip are kept high, at least higher than in Tyrolean. Besides a saddle is to be spotted, which probably coincides with the place where the dorsum and the lamina diverge. The visual impression of a difference among the two mean splines for the two languages is further confirmed by the statistic and auditory analysis: as reported in the radar chart (Fig. 8b), there are significant differences both in the posterodorsal/radical region and in the laminal area; and as derived from the auditory analysis the speaker goes for apical rhotics ([\_]) in Italian and for uvular rhotics ([\_]) in Tyrolean.

Figure 8a – SB5, mean tongue shapes. Figure 8b – SB5, radar chart of the

Data presented so far allow us to answer the first question on whether adult bilinguals t-test.

43 affirm that apparently no space for differentiation is left for very late bilinguals. In fact they tend to almost completely transfer the shape and position of articulation from one language to the other and to articulate rhotics in the second language they learnt as if that were instances of the first language they learnt. On the other hand, simultaneous bilinguals tend to differentiate among articulation patterns in the two languages, even if with varying degrees: indeed as reported in Table 4 while in the case of SB1 the two splines significantly differ in only two points, for the rest of informants Data presented so far allow us to answer the first question on whether adult bilinguals display different patterns of articulation for rhotics in the two languages they speak and to affirm that apparently no space for differentiation is left for very late bilinguals. In fact they tend to almost completely transfer the shape and position of articulation from one language to the other and to articulate rhotics in the second language they learnt as if that were instances of the first language they learnt.

Figure 8a – SB5, Mean tongue shapes Figure 8b – SB5, radar charterization of the t-test

display different patterns of articulation for rhotics in the two languages they speak and to

On the other hand, simultaneous bilinguals tend to differentiate among articulation patterns in the two languages, even if with varying degrees: indeed as reported in Table 4 while in the case of SB1 the two splines significantly differ in only two points, for the rest of informants the number of points increases up to more than half of the traceable profile as it is in the cases of SB3 and SB4.


Table 4 – Number of significantly different points among the two splines.

As already mentioned before, intraspeaker differences in tongue splines might refer to a change in the position of the tongue or to modifications in the shape of the tongue. Changes in the position of portions of the tongue seems to affect SB1, SB2, SB3, SB4 especially and to ensue from the placement of the postdorsum that in Tyrolean tends to be moved towards the uvula and the pharynx. Minor changes in the position affect also the lamina that in Italian (in all but one case, SB4) is shifted upwards, which is sometimes unexpected as in the case when uvular rhotics are produced.

Changes in the shape of the tongue are rarer if considered from the intraspeaker comparison perspective, and are actually limited to SB5 who in Italian keeps the antero-dorsum and the lamina are high towards the hard palate and the alveoli. This modification is in keeping with the different acoustic outputs in the two languages (coronal and dorsal), but counter-intuitively is not to be found in SB4 despite a similar front-back alternation in her auditory productions.

These results on intraspeaker changes in tongue position and shape are relevant to the phonetic characterization of simultaneous bilingual speakers because they point to possible space for differentiation in the articulation of rhotics in the two languages notwithstanding the absence of overt auditory differentiation for the two languages and, *de facto*, the transfer of a phone from one system to the other. This is of importance because it shows how articulatory data can add to the study of acoustically based theories of bilingual phonology, introducing previously unattested considerations such as auditory invariance coupled with articulatory differentiation14. It also allows modeling of the effects of language contact within adult simultaneous bilinguals that as individual speech producers may serve as precursors for language change.

<sup>14</sup> Please refer to Vietti (2012) for an account on acoustic invariance coupled with articulatory differentiation in the uvular fricatives of a simultaneous Italian/Tyrolean bilingual.

#### 3.3.2 Interspeaker comparison

In order to address the second question as to whether the patterns of articulation of adult bilinguals resemble those of monolinguals, we ran an interspeaker comparison between the simultaneous bilinguals SB1-5 and the very late sequential bilingual LSB1 and LSB2 speakers, who acted as control subjects: indeed in a region characterized by societal multilingualism such as South Tyrol, it is almost impossible to find truly monolingual speakers.

Our comparison is impressionistic and based on the superimposition of the different speakers' palates based on translations and rotations (but not on rescalings) aimed at identifying the points of maximum coincidence in the areas of the alveolar ridge and the hard palate as shown in Fig. 9.

45

The interspeaker comparison of tongue profiles thus shows that the patterns of articulation for rhotics by simultaneous bilinguals are different from those used by almost monolingual speakers. This result is of relevance because it shows that simultaneous bilinguals might differ in the articulatory implementation of the same rhotic phonetic segments from very late

Figure 10a – LSB1, SB4, SB5 Italian Figure 10b – LSB2, SB4, SB5 Tyrolean

Figures 10a and 10b report the graphical comparisons of tongue profiles in Italian and Tyrolean respectively for LSB1, LSB2, SB4, SB5, and show that for both languages the mean tongue profiles of the simultaneous bilingual diverge from the averaged profiles of the Italian

As regards Italian, the tongue profile of the simultaneous bilinguals SB4 and SB5 differs from that of LSB1. In the case of SB4 there is no steep raising of the postero-dorsum but a higher rate of root retraction and a moderate lowering of the lamina instead. On the other hand SB5 displays a higher rate of root retraction and a significant lowering of the

As regards Tyrolean, the picture is similar and tongue profiles for SB4 and SB5 differ from that of LSB2. Indeed even if SB4`s tongue shape is similar to that of LSB2 and even if posterodorsum and root almost coincide, the anterodorsum is kept significantly lower by the simultaneous bilingual. On the other hand, regarding SB5, she converges towards the root retraction typical also for the Tyrolean-dominant speaker, but still shows a significant

dominant and of the Tyrolean dominant sequential bilinguals.

SB5

LSB1 SB4

middorsum.

Palates

lowering of the middorsum.

Figure 9 – Inter-speaker comparison: LSB1 (blue), LSB2 (red) mean tongue shapes Figure 9 – Interspeaker comparison: LSB1 (blue), LSB2 (red) mean tongue shapes.

Fig. 9. depicts the static articulation for mean rhotics in the two very late bilinguals LSB1 (the Italian dominat, in blue) and LSB2 (the Tyrolean dominant, in red). This qualitative analysis clearly illustrates that sequential bilinguals use two radically different tongue configurations and allows us to spot the two different places of articulation, the coronal (alveolar) and the dorsal (uvular), which is not unexpected at all given that according to previous research (see also Romano, this volume) coronal articulation are quasi-standard in Italian while uvular articulations are quasi-standard in the Tyrolean dialect. In order to answer our second research question, the comparison between LSB1, LSB2 on Fig. 9 depicts the static articulation for mean rhotics in the two very late bilinguals LSB1 (the Italian dominat, in blue) and LSB2 (the Tyrolean dominant, in red). This qualitative analysis clearly illustrates that sequential bilinguals use two radically different tongue configurations and allows us to spot the two different places of articulation, the coronal (alveolar) and the dorsal (uvular), which is not unexpected at all given that according to previous research (see also Romano 2013) coronal articulation are quasi-standard in Italian while uvular articulations are quasi-standard in the Tyrolean dialect.

the one hand and SB4 and SB5 on the other is, however, of higher relevance than that of LSB1 and LSB2 or that of SB1-SB3 because the two simultaneous bilinguals SB4 and SB5 are the only speakers to modify, in an auditorily perceptible manner, the place of articulation of rhotics in the two languages. In order to answer our second research question, the comparison between LSB1, LSB2 on the one hand and SB4 and SB5 on the other is, however, of higher relevance than that of LSB1 and LSB2 or that of SB1-SB3 because the two simultaneous bilinguals SB4 and SB5 are the only speakers to modify, in an auditorily perceptible manner, the place of articulation of rhotics in the two languages.

SB5 SB4 LSB2 Italian while uvular articulations are quasi-standard in the Tyrolean dialect.

Palate LSB1

Palate LSB2

of rhotics in the two languages.

of rhotics in the two languages.

Palates

Italian while uvular articulations are quasi-standard in the Tyrolean dialect.

In order to answer our second research question, the comparison between LSB1, LSB2 on

are the only speakers to modify, in an auditorily perceptible manner, the place of articulation

Palate LSB1

Figure 9 – Inter-speaker comparison: LSB1 (blue), LSB2 (red) mean tongue shapes

LSB1 LSB2

Fig. 9. depicts the static articulation for mean rhotics in the two very late bilinguals LSB1 (the Italian dominat, in blue) and LSB2 (the Tyrolean dominant, in red). This qualitative analysis clearly illustrates that sequential bilinguals use two radically different tongue configurations and allows us to spot the two different places of articulation, the coronal (alveolar) and the dorsal (uvular), which is not unexpected at all given that according to previous research (see also Romano, this volume) coronal articulation are quasi-standard in

Palate LSB2

Figure 9 – Inter-speaker comparison: LSB1 (blue), LSB2 (red) mean tongue shapes

LSB1 LSB2

Fig. 9. depicts the static articulation for mean rhotics in the two very late bilinguals LSB1 (the Italian dominat, in blue) and LSB2 (the Tyrolean dominant, in red). This qualitative analysis clearly illustrates that sequential bilinguals use two radically different tongue configurations and allows us to spot the two different places of articulation, the coronal (alveolar) and the dorsal (uvular), which is not unexpected at all given that according to previous research (see also Romano, this volume) coronal articulation are quasi-standard in

In order to answer our second research question, the comparison between LSB1, LSB2 on the one hand and SB4 and SB5 on the other is, however, of higher relevance than that of

are the only speakers to modify, in an auditorily perceptible manner, the place of articulation

Figure 10a – LSB1, SB4, SB5 Italian Figure 10b – LSB2, SB4, SB5 Tyrolean Figure 10a – LSB1, SB4, SB5 Italian Figure 10b – LSB2, SB4, SB5 Tyrolean Figure 10a – LSB1, SB4, SB5 Italian. Figure 10b – LSB2, SB4, SB5 Tyrolean.

SB5 SB4 LSB2

Figures 10a and 10b report the graphical comparisons of tongue profiles in Italian and Tyrolean respectively for LSB1, LSB2, SB4, SB5, and show that for both languages the mean tongue profiles of the simultaneous bilingual diverge from the averaged profiles of the Italian dominant and of the Tyrolean dominant sequential bilinguals. As regards Italian, the tongue profile of the simultaneous bilinguals SB4 and SB5 differs Figures 10a and 10b report the graphical comparisons of tongue profiles in Italian and Tyrolean respectively for LSB1, LSB2, SB4, SB5, and show that for both languages the mean tongue profiles of the simultaneous bilingual diverge from the averaged profiles of the Italian dominant and of the Tyrolean dominant sequential bilinguals. As regards Italian, the tongue profile of the simultaneous bilinguals SB4 and SB5 differs Figures 10a and 10b report the graphical comparisons of tongue profiles in Italian and Tyrolean respectively for LSB1, LSB2, SB4, SB5, and show that for both languages the mean tongue profiles of the simultaneous bilingual diverge from the averaged profiles of the Italian dominant and of the Tyrolean dominant sequential bilinguals.

from that of LSB1. In the case of SB4 there is no steep raising of the postero-dorsum but a higher rate of root retraction and a moderate lowering of the lamina instead. On the other hand SB5 displays a higher rate of root retraction and a significant lowering of the middorsum. As regards Tyrolean, the picture is similar and tongue profiles for SB4 and SB5 differ from from that of LSB1. In the case of SB4 there is no steep raising of the postero-dorsum but a higher rate of root retraction and a moderate lowering of the lamina instead. On the other hand SB5 displays a higher rate of root retraction and a significant lowering of the As regards Tyrolean, the picture is similar and tongue profiles for SB4 and SB5 differ from As regards Italian, the tongue profile of the simultaneous bilinguals SB4 and SB5 differs from that of LSB1. In the case of SB4 there is no steep raising of the postero-dorsum but a higher rate of root retraction and a moderate lowering of the lamina instead. On the other hand SB5 displays a higher rate of root retraction and a significant lowering of the middorsum.

that of LSB2. Indeed even if SB4`s tongue shape is similar to that of LSB2 and even if posterodorsum and root almost coincide, the anterodorsum is kept significantly lower by the simultaneous bilingual. On the other hand, regarding SB5, she converges towards the root retraction typical also for the Tyrolean-dominant speaker, but still shows a significant lowering of the middorsum. The interspeaker comparison of tongue profiles thus shows that the patterns of articulation for rhotics by simultaneous bilinguals are different from those used by almost monolingual that of LSB2. Indeed even if SB4`s tongue shape is similar to that of LSB2 and even if posterodorsum and root almost coincide, the anterodorsum is kept significantly lower by the simultaneous bilingual. On the other hand, regarding SB5, she converges towards the root retraction typical also for the Tyrolean-dominant speaker, but still shows a significant lowering of the middorsum. The interspeaker comparison of tongue profiles thus shows that the patterns of articulation for rhotics by simultaneous bilinguals are different from those used by almost monolingual As regards Tyrolean, the picture is similar and tongue profiles for SB4 and SB5 differ from that of LSB2. Indeed even if SB4's tongue shape is similar to that of LSB2 and even if posterodorsum and root almost coincide, the anterodorsum is kept significantly lower by the simultaneous bilingual. On the other hand, regarding SB5, she converges towards the root retraction typical also for the Tyrolean-dominant speaker, but still shows a significant lowering of the middorsum.

45 speakers. This result is of relevance because it shows that simultaneous bilinguals might differ in the articulatory implementation of the same rhotic phonetic segments from very late 45 speakers. This result is of relevance because it shows that simultaneous bilinguals might differ in the articulatory implementation of the same rhotic phonetic segments from very late The interspeaker comparison of tongue profiles thus shows that the patterns of articulation for rhotics by simultaneous bilinguals are different from those used by almost monolingual speakers. This result is of relevance because it shows that simultaneous bilinguals might differ in the articulatory implementation of the same rhotic phonetic segments from very late sequential bilingual not only in the sense that, at least articulatorily, they maintain cross-language phonetic differences, but also that they develop new, third articulatory patterns that diverge from those of native speakers.

middorsum.

#### 4. Discussion

The collected data, and especially the intraspeaker comparison, show that very late sequential bilinguals do not present distinct articulatory patterns for rhotics in the two languages, while the simultaneous bilingual do, even if at varying degrees. Besides interspeaker comparison shows that articulatory patterns for rhotics used by simultaneous monolinguals differ from those used by the very late sequential bilingual speakers who acted as control subjects. Differentiation of patterns might occur as a consequence of articulatory, acquisitional or sociophonetic factors.

In articulatory terms, marked intraspeaker differentiation as exploited by simultaneous bilinguals SB4 and SB5 is used effectively to reach different articulatory targets in the two languages and make the speaker sound like a native monolingual in each of the two codes. Marked intraspeaker differentiation of the kind however seems to be counter-economical: rhotics are indeed known not only for their interchangeability, the coronal/dorsal opposition is indeed non-pathological in both Italian and Tyrolean, but also for the high constellations of gestures that are required to articulate them (Proctor 2009). This might be the reason for developing third articulatory patterns that apparently allow for an economic reuse of most of the articulatory program, except for fine tunings of tongue root and tip positions, which seems comparable to those attested in speakers SB1 and SB3. Indeed these speakers, who resort to an at least auditorily identical [ʁ̞] in both languages, build the auditorily undetectable15 but articulatory visible opposition between rhotics in the two languages on just one parameter, namely a change in the tongue position, and specifically raising vs. lowering or advancing vs. retracting of the whole tongue in Italian and Tyrolean respectively.

From the acquisitional perspective, intraspeaker differentiation of patterns as reported for simultaneous bilinguals could occur as a consequence of the particular organization of bilingual speakers phonetic system. In this sense a proposal such as the one put forward by Flege (1995) on the basis of perceptual and acoustic data in the Speech Learning Model (SLM) is of interest, even if it only partially suits our records. If the transfer of articulatory patterns from the first to the second learnt language attested for LSB1 and LSB2 is compatible with the mechanism of phonetic category assimilation that according to the SLM should affect speakers with limited exposure to the second language (both in qualitative and quantitative terms, namely Age of Arrival and especially Length of Residence), the elaboration of third, merged patterns of articulation that apparently draws on L1 and L2 input should not be a characteristic of

<sup>15</sup> To further prove this statement a broader auditory analysis and/or a rigorous perceptual study is necessary.

speakers exposed to the two languages for a long time. On the contrary, those speakers should rather operate a phonetic category dissimilation so as to increase the phonetic difference between the realizations in the two languages.

Probably the SLM fails to account for data such as those presented here not only because the theory has not been elaborated to explain articulatory data, but also because of the special nature of rhotics with respect to their perceptibility. For example, see the research by Engstrand et al. (2007) on the perceptual bridge in rhotics that showed how coronal and dorsal rhotics may occasionally be confused in perception so that "intended coronals could be interpreted as dorsals or viceversa" (2007:176). And, most of all, because data compared here refer to simultaneous and not to (very late) sequential bilinguals.

Moreover our data pertain to simultaneous bilinguals raised in a societal bilingualism situation. As this difference is of sociophonetic relevance, it should not be disregarded; indeed it should be stressed here that attitudinal factors might also play a role. In particular, the decision of simultaneous bilinguals to characterize themselves as members of one of the two established linguistic communities or as members of the a truly bilingual community might favor the use of two separate patterns of articulation (as in SB4 and SB5) or the development of a third system of articulation (as for SB1, SB2 and SB3) to index respectively their identities. In this sense rhotics would prove once more to be the preferred markers of local identity and/or of social variation selection.

#### 5. Conclusion

This study aimed to add new data and details to previous work on the phonetics of rhotics in Italian and Tyrolean, and showed how variable this class of sounds proves to be if considered from an articulatory perspective. In addition, it aimed to offer new data for the study of the phonological systems of bilingual speakers and showed how previous proposals such as SLM can be put to the test simply through the adoption of articulatory data.

However, the authors of this paper are well aware that the results are preliminary, and therefore not conclusive. First of all, there were limitations in the size of the dataset used to derive their observations, and especially the representativeness of those observations. Secondly, the image quality was sometimes poor, and in particular the image resolution was poor enough to sometimes distort the derived representation of the tongue shape. Lastly, the authors recognize the limitations of the impressionistic technique used to evaluate the data, especially in comparison to quantitative analysis as permitted by techniques such as

SS-Anova (Davidson 2006); or the nearest neighbor distance (Zharkova & Hewlett 2009).

This, togheter with interspeaker normalization, will be addressed in future research.

#### Acknowledgments

We would like to thank the two reviewers for their insightful comments and suggestions. A grateful acknowledgment is also due to Jim Scobbie and Jane Stuart-Smith for their patience listening to our thoughts and to Alan Wrench and Christian Zeni for their unconditioned helpfulness.

#### Reference


*Clinical Linguistics and Phonetics* 19(6-7). 455-502.

Tonelli, Livia. 2002. *Regionale Umgangssprachen*. Padova: Unipress.


#### Appendix 1

Italian target words

privo, prato, prude, triste, trave, truce, cricca, crampo, crudo, briga, bravo, bruco, dritto, drago, druso, grave; grido, gruppo.

Tyrolean target words

prigl, pratzl, prunzen, trichtor, traktor, truhe, krischtn, kravall, krustn, brikett, brathiandl, bruscht, driber, dran di, druckn, grint, graf, gruslig.

# Part II

Phonetics and phonology

## Articulatory coordination in obstruent-sonorant clusters and syllabic consonants: Data and modelling

Philip Hoole1, Marianne Pouplier1, Štefan Benˇuš2 & Lasse Bombien1

1 Institute of Phonetics and Speech Processing, Munich University

2 Constantine the Philosopher University, Nitra; Institute of Informatics, Slovak Academy of Sciences

#### Abstract

The first of two studies in this paper (both using electromagnetic articulography) focused on onset clusters in German and French. Less overlap of C1 and C2 was found in plosive-nasal and plosive-rhotic clusters compared to plosive-laterals. Articulatory modeling was used to identify why the preferred coordination patterns are acoustically advantageous, and implications for metathesis and other diachronic processes are discussed. The second study analyzed the syllabic consonants /l/ and /r/ in Slovak. These consonants did not become kinematically more 'vocalic' in nuclear compared to marginal position. However, nuclear consonants preferred low-overlap coordination with the preceding consonant, compared to onset clusters and to vocalic syllables. We suggest that a low overlap setting favours the emergence of syllabic consonants.

#### 1. Introduction

We consider two areas in which rhotics have proved fruitful for arriving at a better understanding of principles of articulatory coordination in consonant sequences. Both studies are based on recently-acquired articulatory (EMA) data. For the first area we look at onset clusters consisting of plosive plus lateral, nasal, or rhotic in German and French. The overall goal is to understand why these obstruent-sonorant clusters differ synchronically in their frequency of occurrence across languages (and differ in diachronic stability). The second area focuses on syllabic consonants (lateral and rhotic) in Slovak, seeking, in a rather similar vein, to understand why syllabic consonants are typologically rare, and, concomitantly, what factors may favour their emergence when they do occupy a prominent position in the sound structure of a language, as is the case in Slovak. Since for the syllabic consonants we focus in particular on their coordination with adjacent consonants there are substantial superficial similarities in the kinds of sounds sequences examined in both parts of the paper. And both parts are united at a less superficial level in that they aim for a better understanding of general principles of coordinating consonant with consonant and consonant with vowel, and how these principles are affected by position in the syllable and the segmental make-up of the sound sequences involved (for more background to our overall approach see e.g. Pouplier 2012).

#### 2. Obstruent-sonorant clusters in German and French

In this section we first review earlier work in which we compared clusters such as /kl/ and /kn/, and then move on to more recent analysis of plosive-rhotic clusters.

#### 2.1 Plosive plus lateral and nasal

The earlier findings (e.g. Hoole et al. 2009; Bombien et al. 2010; Bombien et al. submitted) revealed a consistent pattern of less articulatory overlap between C1 and C2 in German clusters such as /kn/ compared to /kl/.

Overlap (normalized%)=((Offset\_2-Onset\_4)/(Offset\_4-Onset\_2))\*100 More positive values indicate more overlap of Phase 2 and Phase 4

Figure 1 – Illustration of measurement of articulatory overlap using EMA data. Top panel: audio; middle panel: vertical component of tongue-tip movement; bottom panel: vertical component of tongue-back movement. Phases 1 and 3 extend from onset of movement towards consonant target up to attainment of target position; both time points are based on a 20% velocity criterion. Phases 2 and 4 delimit the target plateau region. In the formula for overlap calculation 'Offset\_2' refers to the time point of the right boundary of Phase 2 (analogously for other time points).

Fig. 1 illustrates how these measurements were carried out using EMA sensors (Carstens AG500 articulograph) located on tongue-tip (indexing constriction for /l/ and /n/) and tongue-dorsum (indexing constriction for /k/).

Various overlap measures have been suggested in the literature. The one used here is based on onset and offset of the target plateau regions for C1 and C2. If (Offset\_2 - Onset\_4) in the formula is positive this indicates that the articulatorily defined constriction for C2 has been reached before the constriction for C1 has been released. If this difference value is negative (actually the normal case in our data) this indicates a lag between the end of C1 constriction and the beginning of C2 constriction. To account to some extent for differences in speech rate between utterances and subjects the values are normalized by the total duration of the constriction phases (i.e. by the difference in time between the offset of C2 and the onset of C1).

The finding of a low degree of overlap in /kn/-clusters is probably to be explained by the fact that premature lowering of the soft palate for /n/ would destroy the acoustic characteristics of the /k/-burst. This interpretation has been confirmed by modelling work using TADA (Nam et al. 2009). TADA is an articulatory synthesis application based on task dynamics and the coupled oscillator model of syllable structure. It allows gestural parameters to be systematically modified, with the final synthesis being performed by generating control parameters for the pseudo-articulatory synthesizer HLSyn (Hanson & Stevens 2002). The connection with HLSyn is particularly interesting in this case, because HLSyn also synthesizes the pressures and flows in the vocal tract resulting from the articulatory input specification. Accordingly, it was possible to observe that when a plosive-nasal cluster is synthesized with TADA's default coordination relations then the intraoral pressure declined prematurely during the plosive because of nasal leakage, resulting in an absence of a burst at the articulatory release of the plosive. By using a lowoverlap coordination topology originally suggested by Goldstein et al. (2009) to capture the difference between so-called homogeneous (high overlap) and non-homogeneous (low-overlap) clusters in Georgian the air-pressure trace became more typical of a plosive. However, for a completely satisfactory result it was also necessary to adjust the gestural parameters of the velar gesture itself (rather than just globally adjusting the overlap between C1 and C2) to ensure that it made a sharp transition from closed during oral closure for the plosive, to open for the following nasal. Thus it seems that plosive-/n/ clusters may be plausibly regarded as physiologically costly.

Fig. 2 compares the air-pressure traces resulting from the default and the 'tuned' coordination parameters when synthesizing a [pn] consonant sequence (preceded and followed by a vowel). The key points to notice are that in the curve labeled 'untuned' the peak air pressure is not quite as high, and also does not maintain a plateau after reaching its maximum at about 100 ms on the time axis. Since labial closure is not released until about 150 ms this indicates nasal leakage of air in the untuned case.

Figure 2 – Intraoral airpressure in synthesized /pn/ sequence comparing standard coordination parameters ('untuned') with adjusted ones ('tuned').

Overall, these results fit well into our guiding hypothesis that 'successful' clusters (/kl/ is clearly diachronically more stable than /kn/) are those that offer a good compromise between parallel transmission of segmental information (efficient production) and adequate recoverability in perception (cf. Chitoran et al. 2002), i.e. key acoustic features of /kl/ would be maintained even at high overlap, whereas /kn/ would suffer from reduced perceptibility due to impairment of burst characteristics from nasal air leakage.

#### 2.2 Plosive-rhotic clusters

More recently, we have compared the onset clusters /pl, bl/ with /pr, br/ (and also /f l/ with /fr/) for five speakers of French and four of German1 . The basic procedure remains as in Fig. 1, except that now a sensor on the lower-lip is used to analyze articulatory activity for C1, and, since all speakers produced a dorsal /r/ the tongue-back sensor was used to analyze C2 in the /r/-clusters. The plosive-rhotic clusters are of interest precisely because it is not immediately

<sup>1</sup> We will be using /r/ as a phonemic symbol to indicate what was in fact a dorsal articulation in the uvular region: approximant or voiced fricative following voiced C1, usually voiceless fricative following voiceless C1. A fifth German speaker who produced an apical variant was left out of consideration here. Obviously, a systematic comparison of apical and dorsal rhotics would be an interesting task for future work.

clear what overlap pattern to expect. On the one hand, based on sources such as Vennemann (2000, 2012) there seems no reason to assume that /r/-clusters are disfavoured compared to /l/-clusters (if anything, the reverse). On the other hand, there are well-documented cases of instability involving r-clusters, namely metathesis such as the following2 :


In fact, there was a very consistent result of *lower* overlap in the /r/-clusters compared to the /l/-clusters (see Fig. 3).

Figure 3 – Articulatory overlap in onset clusters /fl, fr, pl, pr, bl, br/ (from left to right in each panel). Overlap computed as illustrated in Fig. 1: for /l/-clusters overlap of lip and tonguetip; for /r/-clusters overlap of lip and dorsum. Averaged over 4 speakers of German and 5 speakers of French. More negative values indicate less overlap. Error bars indicate standard error of mean over speakers.

The examples of metathesis just mentioned would emerge rather naturally from low overlap between the consonants of the onset cluster, particularly if this were accompanied in turn by a high degree of overlap between the rhotic and the following vowel. This would indeed be a prediction of the c-center principle of coordination identified by Goldstein et al. (2009): as the number of elements

<sup>2</sup> A reviewer suggested that these examples might be better captured by vowel insertion before the rhotic followed by deletion of the original post-rhotic vowel, rather than metathesis in the traditional sense. Such a scenario would fit in equally well with the patterns of gestural shift and the link between gestural overlap and vowel epenthesis that we discuss below. The label attached to these examples is less crucial than the basic point that the seeds of change and instability may be found in specific coordination relations.

in the onset increases the left edge of the onset moves to the left and the right edge to the right leaving the center of the onset in the same position relative to an anchor point in the vowel, regardless of the number of elements in the onset. Extending this to the present case, a complex onset with a low degree of overlap should show a particularly pronounced right shift of the rightmost consonant over the vowel compared to a control simple onset condition3 .

Currently, only a rather small subsection of our corpora is suitable for testing this prediction because we require target items that contrast simple and complex onsets but have the same nucleus vowel and coda (in practice the formation of consonantal closure for the coda usually provides a kinematically better-defined anchor-point than a time-point in the vowel). Thus the test is not as rigorous as we would like. Fig. 4 shows the results for the items that we were able to select from the German and French corpora, namely *tat* vs. *trat* for German and *bac* vs. *braque* for French.

Figure 4 – Timing of syllable onsets with and without rhotic relative to common anchor point in coda consonant, i.e. relative to achievement of /t/ constriction target for German (left) and of /k/ constriction for French (right).

The German data basically show the pattern expected from the c-Center hypothesis: the right edge of the complex onset (i.e. the right edge of /r/ in /tra:t/) is further to the right than the right edge of the simplex onset (/t/ in /ta:t/). However, in marked contrast, for the French data the right shift of the right edge is very weak, whereas the left edge of the onset (i.e. the left edge of the /b/) shifts substantially to the left. This is not the pattern that would be expected

<sup>3</sup> Russell Webb & Bradley (2009) take this line of thought even a step further by simply assuming in their optimality theory account of metathesis that the centre of the rhotic is coordinated with the centre of the vowel.

if we want to argue for a particular affinity between the rhotic and the vowel. Given these mixed results, it would clearly be premature at this stage to claim that this kind of metathesis is accounted for by a particularly strong propensity of the rhotic to overlap the following vowel (with the potential for a categorical diachronic shift in position). Nevertheless, it would clearly be interesting to follow up this line of analysis with a new corpus that is purpose-designed to provide appropriate anchor points. It is also worth noting here a further prediction that emerges from the contention that metathesis is related to the degree of consonantal overlap: based on the results shown in Fig. 3 it should be less common in clusters with lateral than in clusters with rhotic4 . Currently we are not aware of any quantitative data from the sound-change literature that allows this question to be answered5 .

Even though the articulatory patterns found in rhotic clusters were not necessarily the ones initially expected we have recently been able to use articulatory synthesis to gain some further insight into why speakers appear to avoid high overlap in these clusters. For this we used the VocalTractLab package (Birkholz 2012; Birkholz et al. 2006). As a point of departure we used gestural timing parameters that would give a reasonable approximation to the German syllable onset /br/ as in *brat*. The overlap between the onset consonants was then increased by 50 ms. The most striking result was that the duration of voicelessness following release of the /b/ increased substantially (sounding perceptually more like /pr/ than /br/), even though no changes were made to the synthesis control parameters directly related to voicing. This indicates that a dorsal constriction in the velar or uvular region results in aerodynamic conditions that are very unfavourable to restarting voicing (German /b/ is essentially voiceless during the labial closure) if it follows very shortly after the labial release. This illustrates quite elegantly how supraglottal coordination can have repercussions on the realization of the voicing contrast.

Fig. 5 illustrates these results by showing the sonagrams for the normal and high-overlap condition.

<sup>4</sup> See Yanagawa (2003) for an illustration of how constraints on gestural overlap and cohesion may underlie certain metathetic processes in Hebrew.

<sup>5</sup> There may well, however, often be a higher rate of apparent vowel epenthesis in rhotic than lateral clusters. This is discussed in detail in section 2.3 below.

Figure 5 – Comparison of acoustic output for onset consonants of syllable /bra/ synthesized with normal overlap (left) and high overlap (right). Note greater duration of voicelessness after /b/ release in right panel.

#### 2.3 Articulatory coordination in onset clusters: Implications and further discussion

We believe that the analysis of articulatory coordination presented in the preceding sections can throw useful light on the phonological processes apparently affecting these clusters. For example, based on some interesting acoustic observations of stop+lateral and stop+rhotic clusters in French Colantoni & Steele (2007, 2011) point to the particular prevalence of vowel epenthesis in voiced stop+rhotic sequences. Epenthesis is much rarer in voiced stop+lateral sequences and virtually non-existent in voiceless stop+rhotic sequences6 . The latter in turn are claimed to be particularly affected by a process of voicing assimilation since devoicing of the liquid is stronger in voiceless plosive+rhotic than in voiceless plosive+lateral. We feel, however, that there are grounds for caution if the claim is that e.g. /br/ and /pr/ are affected by radically different processes, at least if a process such as epenthesis is to be interpreted as a cognitive operation on the phonological representation (with the aim, in Colantoni & Steele's terms of cluster 'simplification' or 'repair'). Looking back to Fig. 3 it is clear that the articulatory overlap between C1 and C2 is very similar for the voiced/voiceless pairs /br/ and /pr/. To us, this immediately reduces the attractiveness of assuming epenthesis just for the first case but not the second. The introduction of a vocalic element between two consonants should clearly affect the observable coordination relations between these consonants (assuming epenthesis in both cases, with the epenthetic vowel invariably voiceless in /pr/, might be a logical possibility but nonetheless not particularly attractive or useful).

<sup>6</sup> Colantoni & Steele (2007, 2011) also discuss Spanish, where the situation is different: i.e. apparent epenthesis following both voiceless and voiced stops. Probably language differences in apicality vs. dorsality are relevant here.

Essentially, we would argue that /br/ and /pr/ have a very similar gestural specification in terms of the coordination of C1 and C2. Whether an epenthetic vowel appears at the acoustic surface is simply a side-effect of the voicing properties of C1 and does not require an account in terms of phonological processes. (This idea of epenthetic vowels as a side-effect of voicing patterns is supported by the informal observation that they are much less obvious in German /br/ than French /br/: phonologically voiced plosives are indeed fully voiced in French but usually substantially devoiced in German so the strength of voicing in the period immediately following /b/ will be much weaker in German, weakening in turn any impression of a vocalic transitional element.) Note that Colantoni & Steele's observation of a weaker tendency to epenthesis in the lateral clusters also fits in well with our overlap measurements: i.e. the higher overlap for lateral than rhotic clusters. The crucial question is then what drives the low overlap in rhotic clusters. We indicated above one direction that an explanation could take: dorsal constrictions may be particularly unconducive to voicing (e.g. Ohala 1993; see also Colantoni & Steele 2011); accordingly, reducing the amount of overlap reduces the chances of an inappropriate amount of voicelessness at the release of a phonologically voiced consonant. Note that we assume that this could apply equally to German and French voiced stops despite their clear phonetic differences: excessive overlap may result in a delay in re-initiation of voicing after devoiced German /b/, but also in interruption of voicing of normally continuously voiced French /b/. A corollary of this line of thought also explains Colantoni & Steele's further observation of the particularly extensive devoicing of /r/ in /pr/: even if French is traditionally regarded as not having long VOT in voiceless stops the glottal conditions at the release of /p/ are certainly not favourable to voicing, and probably remain so over the transitional period until the formation of constriction for /r/. Since voicing is well-known to show a hysteresis effect in the sense that conditions for initiating voicing are more stringent than those necessary to maintain ongoing voicing (e.g. Hirose & Niimi 1987) then once voicing has ceased at the onset of /p/ re-initiation is not possible until the dorsal constriction has substantially weakened at the offset of /r/. This means that the low overlap between rhotic and plosive can on the one hand make it easier to maintain (or re-start) voicing for voiced C1 but also result in a particularly long period of voicelessness once voicing is interrupted for voiceless C1. Once again we would argue that the devoicing of /r/ in /pr/ does not require an explanation in terms of phonological processes but is, to a first approximation, a simple coarticulatory effect of the devoicing gesture of the initial C1 in combination with the effect of a dorsal constriction on aerodynamic conditions in the vocal tract7 .

#### 3. Syllabic consonants in Slovak

This second main section continues very much in the vein of the first, since it will again provide evidence that an understanding of the emergence and development of sound patterns can depend crucially on an understanding of the patterns of articulatory coordination involved.

As already mentioned in the introduction, the overall aim of the work in this section was to arrive at a better understanding of why the occurrence of syllabic consonants is highly restricted. This can be understood as part of the ongoing aim of ourselves and others to understand how sounds are modified and their coordination patterns change depending on their role in the syllable. Many investigations have compared consonants in onset vs. coda position; here we now look at consonants in nucleus position based on work carried out by Pouplier & Beňuš (2011). This leads to a number of more specific questions along the following lines:


The basic research strategy is to exploit a language such as Slovak in which the occurrence of syllabic consonants is actually quite unrestricted. In addition to the fact that specifically /l/ and /r/ occur in nucleus as well as onset and coda position, these syllabic consonants are – unlike English, German etc. – not restricted to unstressed syllables and can themselves take complex onsets (as in words like *smrt*, with nucleus /r/). Moreover, they are fully integrated into the Slovak morphology of nucleus length alternations (see Pouplier & Beňuš 2011, for further details on the linguistic background).

<sup>7</sup> We use the proviso "to a first approximation" because, extrapolating from our earlier work on German using laryngeal fiberoptic endoscopy and transillumination (Hoole 2006), we expect there to be subtleties to glottal coordination in clusters that we are not yet able to do justice to in French. This work is currently in progress. For more background to the general idea of coarticulatory devoicing see e.g. Browman & Goldstein (1986).

#### 3.1 Recordings

Basically the same EMA setup was used as for the experiments described in Section 2 above (sensors on tongue, lips, jaw). Five Slovak speakers participated. Six repetitions of each target word were recorded per speaker. Target words were embedded in the carrier phrase: *Už hovoríme \_\_\_\_\_\_\_\_ hodinu*. Examples of the target words are given in each analysis section below.

Two main sets of analyses were performed. First, the basic kinematic properties of the liquids were examined as a function of position in the syllable; second, analyses of articulatory coordination similar to those exemplified already in Fig. 1 were carried out.

#### 3.2 Basic kinematic properties of /l/ and /r/

The main thrust of this group of analyses was to determine whether the liquids became in any sense more vocalic when they formed the nucleus (as opposed to onset or coda). In terms of kinematic measurements this was defined as an expectation for longer durations, lower velocities and lower stiffness (ratio of amplitude to peak velocity) in nucleus position.

The following list shows the words used to compare the kinematic properties of the three syllable positions (upper case L is used here and in Table 2 below to indicate a liquid nucleus).


Table 1 shows results averaged over the five speakers. For brevity, only duration of the consonantal constriction phase ('plateau duration'; this corresponds to Phase 4 in Fig. 1) and peak velocity are shown here. The velocity measure is based on the closing movement.


Table 1 – Mean (and standard deviation) plateau duration and peak velocity for /l, r/ as a function of syllable position.

The basic point to observe is that there is no consistent pattern distinguishing nucleus from onset and coda; the nucleus is not consistently longer and does not have lower peak velocity than the marginal positions. If only onset and nucleus are compared then there are no patterns that are consistent across both consonants. Thus at this level of analysis it does not seem that liquids take on more vocalic properties when they form the syllabic nucleus: syllabic consonants are kinematically speaking still consonants (the results given here are representative of the other measures as well, see Pouplier & Beňuš 2011 for details).

#### 3.3 Articulatory coordination

Articulatory coordination will be examined from two complementary points of view, firstly in terms of consonant-consonant coordination (comparing pairs where one member of the pair contains the nucleus and one does not), and secondly in terms of onset-nucleus coordination (comparing consonantal versus vocalic nuclei with the same onset).

#### 3.3.1 Consonant-consonant coordination

For this analysis pairs such as *mrak* vs. *mrk* (onset cluster vs. onset+nucleus) and *park* vs. *mrk* (coda cluster vs. nucleus+coda) were examined (target consonant sequence highlighted in boldface). Coordination between the consonants was captured by a measure that we will refer to as plateau lag, corresponding to the timepoint of the onset of Phase 4 minus the timepoint of the offset of Phase 2 in Fig. 1. The results are given in Table 2.


Table 2 – Mean plateau lag (and SD) for consonant-consonant sequences differing in syllable position.

Please note that since this table shows a lag measure (rather than the overlap measure used in Section 2) larger (more positive) values indicate a wider spacing between the consonants (i.e. *less* overlap). The main result to note is that the lag is greater (overlap is less) when the liquid is in the nucleus, i.e. CL and LC versus CC- and -CC, respectively. The other main result, which in fact is numerically stronger than the first result, is that lags are greater in onset than in coda position (i.e. compare the first two rows to the bottom two data rows of the table). In traditional phonetic terminology this means that CC transitions are more open (in the sense of Catford 1977) to the left of the nucleus. Note that this applies not just to the comparison of onset clusters vs. coda clusters (comparing data rows 1 and 3 of the table) but also to the comparison of onset+nucleus vs. nucleus+coda (data row 2 vs. 4). For standard syllables with a vocalic nucleus it has become almost a commonplace observation in recent years that syllable structure is expressed in typical coordination relations among the structural elements of the syllable. The above two results make the important point that this also applies to syllables with a consonantal nucleus, i.e. these syllables, too, have internal structure: words like *mrk* are not just a simple concatenation of C+C+C8 . Putting this another way: for any given sequence of consonants in Slovak the precise coordination relations among adjacent consonants will depend on the structural position in the syllable to which each consonant is assigned.

#### 3.3.2 Onset-nucleus coordination

The second set of coordination analyses compares words such as *blb* (lateral nucleus) with words such as *bib* (vocalic nucleus). Preliminary examination indicated that articulatory movement for the vowel could be more reliably captured in terms of time of peak velocity of the movement towards the vowel (rather than in terms of the time of attainment of a constriction plateau), so a new lag measure was defined as timepoint of peak closing velocity for nucleus minus timepoint of peak closing velocity for onset consonant. The results of peak velocity lag for the different nucleus types averaged over speakers were as follows (mean +/- standard deviation, in ms):


Lag values are shortest for the vocalic nuclei. Note, though, that there are also clear (and statistically significant) differences between the rhotic and the lateral nuclei. As in Section 2 (and as just pointed out in footnote 8) the rhotics show particularly high lag (low overlap).

The consonant-consonant coordination results indicated that syllables with

<sup>8</sup> It is perhaps interesting to point out a similarity with Section 2 here: lags are generally greater for the rhotics than the laterals (especially in onset position). Note that this is the case even though the rhotics are apical in Slovak but dorsal in German and French.

consonantal and vocalic nuclei have similar internal structure. The present results for onset-nucleus coordination make clear that coordination patterns for consonantal and vocalic nuclei are nevertheless not necessarily identical.

#### 3.4 Syllabic consonants: Discussion

As part of the general discussion of the Slovak results we will now try to offer some ideas as to what could be driving the low-overlap coordination pattern for the consonantal nuclei.

Useful background is given by the assumption in much work in articulatory phonology and coupled-oscillator models of syllable structure that onset consonants are timed in-phase with the following vowel (in effect, their activity starts at the same time). In normal CV syllables this does not result in the vowel being obscured by the consonant because the vowel has longer duration (lower stiffness). For the Slovak syllables with consonantal nuclei this would be a problem, however, because as we saw in the first part of the results consonants in nucleus position do not have clearly different durational properties from those in marginal positions. Thus for the onset C and the nucleus C to be reliably recoverable by the listener a low-overlap pattern of coordination is required. This links up in turn with ideas in the first part of the paper: one reason for the typological rarity of syllabic consonants may be that they require a departure from default CV coordination patterns. Put slightly differently, syllabic consonants interrupt the basic construction principle for spoken language of a slow continuous vocalic substrate with overlaid consonantal constrictions.

We believe that one reason why these typologically rare patterns were able to emerge in Slovak is that the language in general favours a low-overlap setting for consonant-consonant coordination. Thus while the values for plateau lag for onset clusters shown in the first data line of Table 2 are shorter than those for the onset+nucleus case (in the second line of the table) they are nevertheless still quite long in absolute terms, 'long' meaning that it is very typical in Slovak to find a sonorant transition ('epenthetic schwa') between C1 and C2 (see Pouplier & Beňuš 2011, for further details and illustrations).

#### 4. Conclusions

The two main parts of this paper have both tried to make the point that understanding how sound systems develop crucially requires an understanding of how the various articulatory subsystems are coordinated by the speaker. Combining articulatory measurements with articulatory modeling helps in turn to better understand what coordination patterns result in sound structures that can reliably be recovered by the listener in perception. Rhotics, and liquids in general, are particularly advantageous in these contexts because complex syllable structures make such heavy use of them.

#### Acknowledgements

DFG grant HO 3271/3-1 to Phil Hoole; DFG grant PO 1269/1-1 and ERC grant 283349 to Marianne Pouplier; Humboldt fellowship to Štefan Beňuš at IPS Munich.

#### References


## Articulating five liquids: A single speaker ultrasound study of Malayalam

James M. Scobbie1, Reenu Punnoose2 & Ghada Khattab2

1 CASL Research Centre, Queen Margaret University, Scotland 2 School of ECLS, Newcastle University, England

#### Abstract

We investigate the lingual shapes of the five liquid phonemes of Malayalam: two rhotics, two laterals and a more problematic 5th liquid. Ultrasound is used to image the midsagittal tongue surface, mainly in an intervocalic within-word /a\_\_a/ context. The dark retroflex lateral and trill have a retracted tongue root and lowered tongue dorsum, while the three other clear liquids show advanced tongue root and dorsal raising. The 5th liquid is post-alveolar and laminal. Some additional data from an /a\_\_i/ context is considered: the liquids are slightly clearer before /i/: all have a slightly advanced tongue root, and all bar the trill show palatalization. Dynamically, the trill and retroflex lateral have a very stable tongue root in /a\_\_a/, and the 5th liquid has unusual anterior kinematic properties which require further investigation.

#### 1. Introduction

#### 1.1 Background

As part of its liquid inventory, Malayalam (a Dravidian Language of southern India, Krishnamurti 2003) has two rhotics (/r/ and /ɾ/, a trill and a tap), two laterals (/l/ and /ɭ/, an alveolar and a retroflex lateral respectively) and a 5th liquid, most commonly labelled /ʐ/. This last segment has been analysed either as a rhotic, specifically a 'voiced sublamino palatal approximant' (Asher & Kumari 1997:419) or as a lateral, specifically a 'voiced retroflex palatal fricativised lateral' (Kumari 1972:27-28). The only two experimental studies on this general topic prior to 2010 were on the related language Tamil, and in these the 5th liquid was classed as a central retroflex approximant (McDonough & Johnson 1997; Narayanan et al. 1999), but more recently, Punnoose (2011) and Punnoose et al. (2012) have explored the Malayalam liquids from both phonological and phonetic points of view (as well as comprehensively reviewing the existing literature), revealing greater complexity. These recent papers draw attention to the importance of secondary resonances in the system of oppositions as well as contrasts based on primary manner and place of articulation.

In addition, some contemporaneous research on Kannada is using ultrasound to explore retroflex stops in relation to other lingual obstruents (Kochetov et al. to appear; Kochetov et al. 2012), which will provide results that will dovetail to some extent with those presented here.

While it might be possible to definitively classify the 5th liquid either as a rhotic, a lateral or a non-rhotic central approximant, a non-deterministic or 'fuzzy' approach to phonological systems (cf. Scobbie & Stuart-Smith 2008) assumes there may be underlying phonological and phonetic reasons for its ambiguous status. The phonological patterning of /ʐ/, for example, offers a somewhat mixed picture regarding its lateral/rhotic identity, and the phonetic characteristics relevant to distinguishing it from the other liquids might likewise be variable or gradient. On the one hand, the rhotics and the 5th liquid are the only consonants not to have a singleton-geminate contrast. On the other hand, /ʐ/ and /ɭ/ tend to alternate in certain morpho-syntactic contexts. For more detail on these complex patterns, see Punnoose (2011).

Punnoose (2011) and Punnoose et al. (2012) have shed new light on the sound system using production data from eight adult males. These detailed studies constitute the first acoustic investigation of Malayalam /ʐ/. One of their aims was to consider the hypothesis that /ʐ/ is a third rhotic. Also, in (mainly auditory) impressionistic terms, they have found that /ʐ/ sounds like a clear post-alveolar central approximant, and that it appears to lack retroflexion, in the sense in that it lacks strong retraction with perhaps sublaminal contact during a forward-moving constriction.

In addition, they have explored the acoustic phonetic nature of the liquid system, exploring a number of parameters that can distinguish (or not) these segments from each other. Acoustically, however, it is hard to definitively categorise /ʐ/ as either rhotic or lateral. Its first two formants (especially F2) were found to be close to the values for one rhotic (the tap) and one lateral (the alveolar approximant), namely /ɾ/ and /l/, which Punnoose and colleagues classify as having a *clear* resonance, while its F3 and F4 pattern closer to the values for /r/ and /ɭ/, which they class as having a *dark* resonance.

#### 1.2 Summary of methodological and descriptive goals

The rhotic system of Malayalam has a binary phonological opposition, which is traditionally characterized as tap vs. trill. A secondary phonetic correlate of this contrast is a clear vs. dark resonance difference (for the tap vs. trill respectively), which is the *same* secondary resonance distinction found in the alveolar and retroflex laterals (respectively). It would therefore be useful to get information on the secondary and primary articulation of both rhotics, both laterals, and the 5th liquid. No single articulatory technique can provide all the information required, but ideally we would use one which is:


Here we use high-speed Ultrasound Tongue Imaging (hs-UTI). Ultrasound scanning provides us with a mid-sagittal, two dimensional view of the tongue (Davidson 2012). Ultrasound is a convenient non-invasive technique, but it should be noted that in order to stabilize the probe to the head, protocols need to be adopted which shorten data collection time due to speaker fatigue, and also that *high-speed* UTI requires more expensive and specialized instrumentation than normal video-based UTI. The low cost and portability of the latter make it more likely to be used in fieldwork (Gick 2002, Lawson et al. 2008, Lawson et al. 2011), but its longer data-capture window is more subject to spatiotemporal artefacts (Wrench & Scobbie 2006) and thus is harder to synchronize to the acoustic signal. For many purposes, the approximately 60 frames per second that is possible with de-interlaced video UTI should prove sufficient, but to capture details of very fast moving ballistic flaps or clicks, for example, hs-UTI is likely to be preferable (Wrench & Scobbie 2011). The non-dynamic findings presented here would be easily observed using video UTI.

We will look for articulatory correlates of the clear vs. dark distinction by examining the position of the tongue root and dorsum. The relative location of anterior parts of the tongue, whether blade or tip, will be examined to reveal differences in constriction location and shape, which can be related to the primary place and method of articulation. These comparisons will be largely qualitative and tentative, since this is a single-speaker study, and one which uses a very small dataset. Our interpretative comments are based on highly accurate instrumental data, the nature of which we will try to convey in illustrative figures below, guided by what we have found from our earlier acoustic and transcriptional phonetic research.

#### 2. Method

#### 2.1 Instrumentation

Each digital ultrasound image was created from a single scan from a probe held by a stabilizing headset (Articulate Instruments 2008, Scobbie et al. 2008). The scan rate / frame rate is flexible, up to around 400 frames per second (fps), and was set at 100 fps (one frame each 10 ms), synchronized to the audio via the high-speed Articulate Assistant Advanced™ system (Articulate Instruments 2011, Wrench & Scobbie 2011), based around an Ultrasonix SonixRP scanner remotely controlled via Ethernet from a PC. The transducer was a shorthandled paediatric microconvex probe operating at 6 MHz. The field of view was set at 112.5°.

Acoustic data was recorded on the Articulate Instruments multichannel acquisition system at 22,050 Hz. In this system, a hardware pulse generated at the moment that each ultrasound scan is made enables accurate synchronisation with the acoustics. Each ultrasound frame is then stored by the AAA system as a set of raw echo-pulse return data (Figure 1a) from which a standard two dimensional image is created when viewed (Figure 1b). A semi-automatic linefitting process within AAA was used to trace the location of the tongue surface. Figure 1a shows how an ultrasound scan samples the space in the field of view: pulses spread out the further they get from the probe. Each image is therefore less accurate in circumferential as opposed to radial dimensions. This may be by as much as around an order of magnitude (e.g. ~3 mm as opposed to ~0.3 mm). When the tongue surface is orthogonal to the echo-pulse beams, distance from probe data can be very accurate, but a tongue surface that is retroflexed or otherwise positioned so that it is approximately parallel to the beams is picked up less accurately, due partly to greater scattering of the ultrasonic echo, and partly as an artefact of image processing: the tongue surface will be detected at the location of each echo-pulse beam, discretising its location in little steps, as we will see below. A potential imaging problem is that strong retroflexion, where the tongue tip curls back during a sublaminal contact, may not be visible. Such articulations will look similar to highly apical supralaminal postalveolar or palatal articulations in static images. Therefore, what appear below to be very retracted apicals are classified by us as retroflex. We may however be underestimating the extent of this retroflexion by not being able to detect any sublaminal contact. A paired UTI/MRI (or UTI/EMA) dataset would be useful to help investigate this issue further.

Figure 1 – Ultrasound images of a single frame. Anterior is to right. a. The raw data return shows echo data from 76 echo-pulse scan-lines radiating from the probe (the curved indent, bottom centre). They are each 8 cm long. Bright areas result from a higher intensity of echo, which is particularly strong on the tongue-air boundary. b. This standard 2D image is constructed from this echo pulse data by AAA software, interpolated in arcs to fill in gaps between scan-lines.

For analysis, we used the AAA software. We superimposed a measurement fan of 42 radial measurement lines onto images such as the one in Figure 1b (the 42 radii being independent of the number of scan lines or the size of the field of view). A single control point (or "tag") on each fanline radius was used to tie an analysis curve to the location of bright tongue-air boundary, if this indication of the tongue surface is indeed visible at all in that area of the measurement fan. Gradient confidence measures of this edge-tracking were available when automatic edge-tracking fitting was used. Hand-corrections were used as an override if needed. Thus a tongue curve was created for each frame of interest with at most 42 coordinates. We chose the AAA option to smooth each curve slightly to avoid tracking noisy aspects of the image too closely.

In this study, each word spoken (for materials, see below) provided a single representative tongue curve for one target liquid. These averaged in AAA to provide a mean curve, a process which also provides the standard deviation of the location of the contributory tag points along each fanline.

#### 2.2 Speaker and protocol

The male adult speaker and the second author are both (multilingual) native speakers of the Central Travancore dialect of Malayalam. The second author explained the materials and the orthographic conventions for the Latin script presentation (Table 1) orally and using Malayalam orthography. The speaker was familiar with these conventions. Latin script had to be used for software reasons. She also monitored data collection to make sure there were no mistakes, intervening when needed to elicit a correction. The speaker also corrected himself on a few occasions, and showed a high level of awareness of his target productions. Words were recorded in pairs for speed of data collection – an audible beep accompanied the appearance of the prompt on screen, and the speaker then said the word twice with a small pause in between each token. The speaker's bite plane was also imaged to enable rotation of images to the occlusal plane, and images of swallowing were captured to provide an indication of the location of the alveolar ridge and hard palate.


#### 2.3 Materials

Table 1 – Materials, a mix of real words and pseudo-words (underlined), with a count of the number of tokens elicited (and analysed). Italicised words are not analysed here.

The materials were designed to elicit minimal sets of all five liquids, which is possible only in intervocalic position, with multiple repetitions, within 20 minutes to avoid speaker discomfort, and in mainly low vowel and non-lingual consonantal contexts to reduce coarticulation effects. The liquids were mainly elicited between /a/ vowels, or in an /a\_\_i/ context, with a few other tokens in other contexts to provide further detail. Pseudo-words were kept to a minimum, and filled out the /a\_\_i/ frame. For methodological simplicity labial consonants (or none) in the carrier words were preferred, to avoid unwanted lingual coarticulation. No carrier phrase was used, again to minimize coarticulation and to speed up data collection. In addition, a passage was elicited (presented in Malayalam orthography). This has not yet been analysed.

At least four tokens of each (pseudo) word were captured, with 8 tokens for some. The number of tokens elicited for words formally analysed in this paper is also given in Table 1 (in bold typeface). We will focus here on the core /a\_\_a/ context (with no other consonant or a labial consonant) which provides most tokens, though, looking ahead, it is clear that words with /k/ or with other vowels show similar behaviour.

#### 2.4 Annotation and data extraction

Figure 2 attempts to convey, in a single composite image, the way a tongue surface changes shape and location in time in the mid-sagittal plane, by presenting an overlay of a temporal sequence of tongue curves, using /ɾ/ as an example. Each of these curves has been semi-automatically traced over an image like the one in Figure 1b using AAA software. During the actual annotation process, however, a time sequence of un-traced raw images was examined one at a time in a moving sequence, manually-controlled, from which one frame was chosen on a holistic basis as being the one containing a tongue shape best characterizing the target. Thus normally from each word only one tongue curve was drawn, on the target frame. It was then extracted and averaged together with shapes from other tokens of the same target. Plotting the *average* target enables an accurate comparison of the position and shape of each of the five liquids.

The basis for selecting a single frame from the ultrasound sequence was as follows. The frame chosen (by the first author) was the one which seemed to characterise the consonantal constrictions as a whole, but with primacy given to the anterior articulation. For /ɭ/ and /ʐ/ the frame chosen was the one with the most strongly retracted and raised blade, often achieved for just a single frame or two, while the other three liquids were captured at a frame of a stable (alveolar) constriction. For the rhotics and /l/, if many frames seemed equally characteristic of the anterior articulation, the most extreme root articulation dictated the choice of frame.

Figure 2 also shows the overall orientation of the tongue shape. It has been rotated so that the speaker's occlusal place is horizontal (at 48 mm high). The location of the hard palate can be estimated by examining the images of swallowing, because the tongue presses up against the hard palate during the swallowing action. A palate shape can be superimposed on plots of the lingual targets in the AAA workspace (see Figure 4 below). The target shape for the /ɾ/ in Figure 2 corresponds to the last in the sequence plotted (i.e. with the highest blade). It was the curve used as the basis for calculating the mean tongue configuration for /ɾ/. The /a/ curve, however, is for illustration – in this token it was early in the vowel, but no particular convention was used to identify it. Overall, Figure 2 conveys something of the nature of tongue movement and how segmental targets can be identified; in this case an intervocalic liquid.

Figure 2 – Example of the dynamic movement from /a/ into the tap /ɾ/, rotated so that the occlusal plane is horizontal. Curves are 10 ms apart in time, and this example lasts 160 ms. The direction of movement in time is shown by the large arrows. The origin of the measurement space is arbitrary – the lower edge of the upper teeth are approximately at (110, 50). Tick marks are at 20 mm intervals. The same axes are used in all figures, to enable comparison of pharyngealisation and palatalization.

Figure 2 is not merely illustrative, however – it is also a useful way of conveying the dynamic and coarticulatory aspects of the articulation of an intervocalic liquid, as we will see below.

Note that as the tongue tip raises, an air pocket may appear under the tip. This sub-lingual cavity is particularly important as a resonant chamber, influencing the acoustic properties of the liquid sounds – but unfortunately air (and hence the sublingual cavity) is impenetrable to ultrasound at the frequencies used in scanners. A raised tip is likely to be invisible to ultrasound: the corollary is that neither the anterior termination of the tongue surface in an ultrasound tongue image nor the right-hand end of the curve traced from it will necessarily correspond to the tongue tip itself, just to the most anterior part of the blade which is imageable. Moreover, when the tongue is in contact with the hard palate or alveolar ridge, the discernibility of the surface often diminishes.

Finally, by measuring the distance between the curves in Figure 2 it is possible to estimate the speed of the tongue blade surface as it moves through the oral cavity. Just a single typical token of each liquid was quantified, as an indicative measure. We measured the speed of the blade orthogonal to the direction of travel – thus most segments were examined moving along a trajectory that was roughly vertical in these figures, whereas the forward movement of the retroflex flap was captured in the analysis of its motion (see Figure 9). From these tokens, a rough articulatory duration of 'time at the target' was also calculated, based on the number of frames where the tongue blade was moving slower than a threshold of 30 mm/s.

#### 3. Impressionistic results

The impressionistic phonetic realisations of our single speaker's rhotics and laterals was broadly comparable to those in Punnoose (2011) and Punnoose et al. (2012). Primary place and manner were as expected, with the following exceptions: the tap /ɾ/ was sometimes quite stop-like and sometimes fricated; the trill /r/ was often undershot, as seems typical for phonemic trills (Jones submitted); the lateral /l/ was sometimes fricated.

In terms of secondary resonance, the /ɾ/ and the /l/ both sounded relatively clear while the /r/ and the /ɭ/ were both impressionistically darker in resonance, as Punnoose and colleagues have found. The 5th liquid /ʐ/ sounded post-alveolar, neither strongly clear nor dark, it was often fricated, and had some mild rhotic qualities in the approach to maximum constriction (cf. the temporally asymmetrical frication and formant movements in Figure 3).

Figure 3 – Example token of azha /aʐa/ showing temporal asymmetry in its formant movement and frication.

When the liquids were before /i/, mostly there were only small diff erences between /aCa/ and /aCi/: /ali/ was more consistently fricated than /ala/, but the number of tokens is too small to draw conclusions from. Bigger auditory diff erences from /aCa/ were observable for liquids following /i/, for the small dataset available. Only the clear liquids appear in this context, and all were fricated.

In /aɾi/, as with /aɾa/, the tap often was stop-like, but in /iɾa/, it was more like a short fricative or a fricated tap. Th e alveolar /l/ in /ila/ was strongly fricated, probably due to the high tongue position. Th e /i/ in /iʐa/ sounded less peripherally high and front that the /i/ in /iɾa/ and /ila/, and there were formant diff erences.

#### 4. Articulatory results

#### 4.1 Single frames

Overall, a diff erent spatial tongue shape was found for each liquid (Figure 4), in addition to dynamic diff erences which will be explored more below.

Bearing in mind that the tip might not have been imaged, in Figure 4 we can see a slightly tighter constriction between what is probably the alveolar region and the tip/blade for /l/ than for the other liquids – the blade is pretty parallel to the occlusal plane, and lies about 12 mm above it. Th e comparable part of the tongue in both rhotics appears to lie about 3 mm lower. In the tap, as with /l/, the blade is pretty fl at (then raises slightly in the dorsal area), whereas for the trill, the blade slopes 'downwards', about 6 mm closer to the probe. Strong retrofl exion is clear in /ɭ/, causing some artefacts that result in the surface appearing to pass through the hard palate1 . Th e 5th liquid /ʐ/ is clearly postalveolar and laminal.

Clear pharyngealisation can be seen in the dark /ɭ/ and /r/, with root retraction of about 1 cm compared to the three other liquids. Slight raising of the front of the tongue, which may be a type of weak palatalization, can be seen in the clear /ɾ/ and especially /l/, and also in the 5th liquid /ʐ/ in addition to is postalveolar close constriction. Th e 5th liquid looks quite like a bunched tip down rhotic (Lawson et al. 2011). Clear pharyngealisation can be seen in the dark /ɭ/ and /r/, with root retraction of about 1cm compared to the three other liquids. Slight raising of the front of the tongue, which may be a type of weak palatalization, can be seen in the clear /ɾ/ and especially /l/, and also in the 5th liquid /ʐ/ in addition to is post-alveolar close constriction. The 5th liquid looks quite like a bunched tip down rhotic (Lawson et al. 2011).

Similar locations and shapes can be seen for the liquids in the contexts of the other vowels

A dynamic qualitative articulatory analysis was undertaken, and the results will be conveyed using typical single tokens below. A tongue curve was traced onto every frame from a stable

flap. Overall, the rhotics (/r/, /ɾ/) and laterals (/l/, /ɭ/) had relatively simple motion paths which are well-represented in the figures, but the 5th liquid /ʐ/ had a more complex blade motion,

In each of the figures below, representing a single token, the left panel represents the movement from the preceding vowel to the target, and the right panel from the target through

which we have tried to represent with a bendy arrow.

to the following vowel.

Figure 4 – Averaged tongue shapes for the five liquids between /a/ vowels. Thick line indicates the mean, flanked by ±1 s.d. (In a colour reproduction, the rhotics are in red, laterals in blue, 5th liquid in green.) Figure 4 – Averaged tongue shapes for the fi ve liquids between /a/ vowels. Thick line indicates the mean, fl anked by ±1 s.d. The rhotics are in red, laterals in blue, 5th liquid in green.

examined, with some slight coarticulatory differences. *4.2 Dynamic analysis* Similar locations and shapes can be seen for the liquids in the contexts of the other vowels examined, with some slight coarticulatory diff erences.

67

vowel position before the liquid to one after it. The tongue roots are generally dynamically active, except in /aɭa/, in which only the blade moved, as part of the rapid forward moving <sup>1</sup> Recall that the retroflex lateral may in fact have a sublaminal contact, but since we are unable to detect the 'looping back' in static images, we have to 'join the dots' to give the impression of a supra-laminal contact.

#### 4.2 Dynamic analysis

A dynamic qualitative articulatory analysis was undertaken, and the results will be conveyed using typical single tokens below. A tongue curve was traced onto every frame from a stable vowel position before the liquid to one after it. Th e tongue roots are generally dynamically active, except in /aɭa/, in which only the blade moved, as part of the rapid forward moving fl ap. Overall, the rhotics (/r/, /ɾ/) and laterals (/l/, /ɭ/) had relatively simple motion paths which are well-represented in the fi gures, but the 5th liquid /ʐ/ had a more complex blade motion, which we have tried to represent with a bendy arrow.

In each of the fi gures below, representing a single token, the left panel represents the movement from the preceding vowel to the target, and the right panel from the target through to the following vowel.

Figure 5 – Example of the dynamic movement into and out of the clear tap /ɾ/ in an /a/ context. The left frame shows the closing gesture from /a/ into the target frame for /ɾ/, and the right frame the opening gesture from /ɾ/ into /a/. Curves are 10 ms apart in time, and the arrow shows the direction of movement in time. These conventions apply to the other dynamic figures below. Figure 5 – Example of the dynamic movement into and out of the clear tap /ɾ/ in an /a/ context. The left frame shows the closing gesture from /a/ into the target frame for /ɾ/, and the right frame the opening gesture from /ɾ/ into /a/. Curves are 10 ms apart in time, and the arrow shows the direction of movement in time. These conventions apply to the other dynamic fi gures below.

The slightly wider spacing of the traces in the left panel of Figure 5 near the tip, in the start of the tap, show the most rapid movement in the whole sequence, as the blade and tip move rapidly upwards. The tongue does, however, stay in the constriction location for a couple of frames – the shortest constriction is around 20 ms. Th e slightly wider spacing of the traces in the left panel of Figure 5 near the tip, in the start of the tap, show the most rapid movement in the whole sequence, as the blade and tip move rapidly upwards. Th e tongue does, however, stay in the constriction location for a couple of frames – the shortest constriction is around 20 ms.

Figure 6 – A single token of the dynamic movement into and out of the dark trill /r/. The left panel includes more than one raising-lowering cycle of the blade (and tip), indicated by the thin arrow.

One of the most striking things about the trill is the stability of the root in the /a\_\_a/ context. In the left panel, it is perhaps not obvious that there are two trill events in this /r/, as the tip raises quickly to the maximum height (shown by the widely spaced curves), lowers by a few millimetres, and raises again. The time spent in this location for one constriction of the trill is short, around 30 ms. The up-and-down motion of the blade can be easily seen in a velocity trace (Figure 7). We can see rapid upwards movement of around 250 mm/s at 50 ms, followed by a downward-upward-downward-upward trilling motion between 75 ms-140 ms

approximately, followed by downwards movement towards the next vowel from 150 ms.

69

Figure 6 – A single token of the dynamic movement into and out of the dark trill /r/. The left panel includes more than one raising-lowering cycle of the blade (and tip), indicated by the thin arrow.

One of the most striking things about the trill is the stability of the root in the /a\_\_a/ context. In the left panel, it is perhaps not obvious that there are two trill events in this /r/, as the tip raises quickly to the maximum height (shown by the widely spaced curves), lowers by a few millimetres, and raises again. Th e time spent in this location for one constriction of the trill is short, around 30 ms. Th e up-and-down motion of the blade can be easily seen in a velocity trace (Figure 7). We can see rapid upwards movement of around 250 mm/s at 50 ms, followed by a downward-upward-downward-upward trilling motion between 75 ms-140 ms approximately, followed by downwards movement towards the next vowel from 150 ms.

Figure 7 – Velocity upwards (positive values) and downwards (negative values) of the blade in trilled /ara/.

Th e alveolar lateral (Figure 8) is clearly palatalized, with a consistent speed of transition. Th e time spent at the constriction is around 70 ms, around twice as long as the other liquids. Th e rather 'pointy' palatalization may be an imaging artefact. The time spent at the constriction is around 70 ms, around twice as long as the other liquids. The rather "pointy" palatalization may be an imaging artefact.

The alveolar lateral (Figure 8) is clearly palatalized, with a consistent speed of transition.

Figure 7 – Velocity upwards (positive values) and downwards (negative values) of the blade in trilled

Figure 8 – A single token of the dynamic movement into and out of the clear alveolar lateral /l/.

Figure 8 – A single token of the dynamic movement into and out of the clear alveolar lateral /l/. The dark retroflex lateral, like the dark trill, shows a highly stable tongue root in the /a\_\_a/ context (Figure 9) with a retraction and raising of the blade (and perhaps inversion of the tip),38 followed by a very rapid forward flapping motion. The time spent at the maximum retracted location is brief, around 30 ms. The forward motion of the tongue in the right panel shows analysis artefacts that could be resolved by smoothing multiple tokens: the tongue will actually be moving forwards evenly, but its location is only represented in the raw data along Th e dark retrofl ex lateral, like the dark trill, shows a highly stable tongue root in the /a\_\_a/ context (Figure 9) with a retraction and raising of the blade (and perhaps inversion of the tip)2 , followed by a very rapid forward fl apping motion. Th e time spent at the maximum retracted location is brief, around 30 ms. Th e forward motion of the tongue in the right panel shows analysis artefacts that could be resolved by smoothing multiple tokens: the tongue will actually be moving forwards evenly, but its location is only represented in the raw data along the echo-pulse beams giving rise to the clumping when the tongue surface is nearly parallel to those beams (Wrench & Scobbie 2011).

the echo-pulse beams giving rise to the clumping when the tongue surface is nearly parallel to those beams (Wrench & Scobbie 2011). Generally, the speed of the movement of the tongue into and out of the constrictions has a peak of around 100-150 mm/s (about 50 mm/s faster in the closing gesture than the opening gesture), except for this flap, in which we estimate a closing speed of 200 mm/s and a forward Generally, the speed of the movement of the tongue into and out of the constrictions has a peak of around 100-150 mm/s (about 50 mm/s faster in the closing gesture than the opening gesture), except for this fl ap, in which we estimate a closing speed of 200 mm/s and a forward fl apping speed of around 400 mm/s. Only the 5th liquid moves as fast in its closing gesture.

flapping speed of around 400 mm/s. Only the 5th liquid moves as fast in its closing gesture.

Figure 9 – A single token of the dynamic movement into and out of the darker retroflex lateral /ɭ/. A more detailed path of movement is shown by the thin arrows.

38It is possible, recall, that there is sublamimal contact here which we cannot represent, though we think on consideration that

70

the contract is surpalaminal.

<sup>2</sup> It is possible, recall, that there is sublamimal contact here which we cannot represent, though we think on consideration that the contract is surpalaminal.

Figure 7 – Velocity upwards (positive values) and downwards (negative values) of the blade in trilled /ara/.

The alveolar lateral (Figure 8) is clearly palatalized, with a consistent speed of transition. The time spent at the constriction is around 70 ms, around twice as long as the other liquids.

Figure 8 – A single token of the dynamic movement into and out of the clear alveolar lateral /l/.

The dark retroflex lateral, like the dark trill, shows a highly stable tongue root in the /a\_\_a/ context (Figure 9) with a retraction and raising of the blade (and perhaps inversion of the tip),38 followed by a very rapid forward flapping motion. The time spent at the maximum retracted location is brief, around 30 ms. The forward motion of the tongue in the right panel shows analysis artefacts that could be resolved by smoothing multiple tokens: the tongue will actually be moving forwards evenly, but its location is only represented in the raw data along the echo-pulse beams giving rise to the clumping when the tongue surface is nearly parallel to

Generally, the speed of the movement of the tongue into and out of the constrictions has a peak of around 100-150 mm/s (about 50 mm/s faster in the closing gesture than the opening

The rather "pointy" palatalization may be an imaging artefact.

those beams (Wrench & Scobbie 2011).

Figure 9 – A single token of the dynamic movement into and out of the darker retroflex lateral /ɭ/. A more Figure 9 – A single token of the dynamic movement into and out of the darker retrofl ex lateral /ɭ/. A more detailed path of movement is shown by the thin arrows.

detailed path of movement is shown by the thin arrows. 38It is possible, recall, that there is sublamimal contact here which we cannot represent, though we think on consideration that the contract is surpalaminal. 70 Finally, the 5th liquid has rapid movement in the closing phase, with typically rhotic retraction and raising in a curving path, albeit weakly, and not in a retrofl ex way. Th e release of the constriction is, however, rather extraordinary, showing a zig-zig motion that is hard to convey in these diagrams. Th e movement is, we think, indicative of a change in shape of the blade and tip of the tongue. It may have to do with a transition between a more grooved central and more lateral or slit-like airstream, retraction of the tip, or some other changes in lingual shape. Whatever is the source of this strange movement, we presume the dynamic changes are not accidental, but are associated with the partially rhotic and partially lateral nature of the segment. Th e opening gesture starts with a downward, slightly retracting opening, quite unlike the tip-down bunched /r/ seen in Lawson et al. (2011). Moreover, at the end of the 5th liquid, in fact in the following /a/, the lowered blade extends forwards again without lowering more, indicating it has been previously retracted into the tongue body. Th e tip also appears to elongate forwards in /aʐi/ during the /i/. A fi nal possibility is that this apparent motion is mainly due to the fi lling in of the midsagittal sublingual cavity by the underside of the tip and blade, and that the upper surface of the tongue is hardly moving forward at all.

tip also appears to elongate forwards in /aʐi/ during the /i/.

tip also appears to elongate forwards in /aʐi/ during the /i/.

Finally, the 5th liquid has rapid movement in the closing phase, with typically rhotic retraction and raising in a curving path, albeit weakly, and not in a retroflex way. The release of the constriction is, however, rather extraordinary, showing a zig-zig motion that is hard to convey in these diagrams. The movement is, we think, indicative of a change in shape of the blade and tip of the tongue. It may have to do with a transition between a more grooved central and more lateral or slit-like airstream, retraction of the tip, or some other changes in lingual shape. Whatever is the source of this strange movement, we presume the dynamic changes are not accidental, but are associated with the partially rhotic and partially lateral nature of the segment. The opening gesture starts with a downward, slightly retracting opening, quite unlike the tip-down bunched /r/ seen in Lawson et al. (2011). Moreover, at the

Finally, the 5th liquid has rapid movement in the closing phase, with typically rhotic retraction and raising in a curving path, albeit weakly, and not in a retroflex way. The release of the constriction is, however, rather extraordinary, showing a zig-zig motion that is hard to convey in these diagrams. The movement is, we think, indicative of a change in shape of the blade and tip of the tongue. It may have to do with a transition between a more grooved central and more lateral or slit-like airstream, retraction of the tip, or some other changes in lingual shape. Whatever is the source of this strange movement, we presume the dynamic changes are not accidental, but are associated with the partially rhotic and partially lateral nature of the segment. The opening gesture starts with a downward, slightly retracting opening, quite unlike the tip-down bunched /r/ seen in Lawson et al. (2011). Moreover, at the end of the 5th liquid, in fact in the following /a/, the lowered blade extends forwards again without lowering more, indicating it has been previously retracted into the tongue body. The

Figure 10 – A single token of the dynamic movement into and out of the clear 5th liquid /ʐ/. The more Figure 10 – A single token of the dynamic movement into and out of the clear 5th liquid /ʐ/. The more detailed path of movement is shown by the thin arrows. detailed path of movement is shown by the thin arrows. Finally, here is an example of the dynamics of an asymmetrical environment for one of the

detailed path of movement is shown by the thin arrows. Finally, here is an example of the dynamics of an asymmetrical environment for one of the liquids, in this case /r/. Tongue root advancement and palatalization can been seen following the trill, but the root is still very stable before the trill, more so in fact than the root in /aɭi/ where the coarticulatory influence of the /i/ extends more strongly across the whole of the Finally, here is an example of the dynamics of an asymmetrical environment for one of the liquids, in this case /r/. Tongue root advancement and palatalization can been seen following the trill, but the root is still very stable before the trill, more so in fact than the root in /aɭi/ where the coarticulatory infl uence of the /i/ extends more strongly across the whole of the liquid into the preceding vowel. liquids, in this case /r/. Tongue root advancement and palatalization can been seen following the trill, but the root is still very stable before the trill, more so in fact than the root in /aɭi/ where the coarticulatory influence of the /i/ extends more strongly across the whole of the liquid into the preceding vowel.

Figure 11 – A single token of the dynamic movement into and out of the darker trill /r/, in /ari/. The right Figure 11 – A single token of the dynamic movement into and out of the darker trill /r/, in /ari/. The right panel shows palatalization (raising) and root advancement in the transition towards the /i/.

Figure 11 – A single token of the dynamic movement into and out of the darker trill /r/, in /ari/. The right

panel shows palatalization (raising) and root advancement in the transition towards the /i/.

71

71

#### 4.3 Coarticulation *4.3 Coarticulation*

/iCa/ materials.

in the clear consonants.

Some of the liquids have been shown to be clear and others dark, acoustically (Punnoose 2011) and in articulation (above). Punnoose also examined eff ect of vowel context on the liquids, and vice versa. With the small amount of data available here, we will chart changes to liquids, looking at the eff ect of coarticulation of an /i/ vowel compared to the /a\_\_a/ context presented above. We will do this mainly for /aCi/, but also mixing in the small number of /iCa/ materials. Some of the liquids have been shown to be clear and others dark, acoustically (Punnoose 2011) and in articulation (above). Punnoose also examined effect of vowel context on the liquids, and vice versa. With the small amount of data available here, we will chart changes to liquids, looking at the effect of coarticulation of an /i/ vowel compared to the /a\_\_a/ context presented above. We will do this mainly for /aCi/, but also mixing in the small number of

Th e eff ects of coarticulation can be seen in Figure 12 and Figure 13 below. For the clear liquids (Figure 12) the tongue root is a little more advanced and there is some independent extra raising of the tongue into the palatal arch, particularly for /l/ and /ɾ/. Th e 5th liquid may perhaps become more apical and alveolar in conjunction with its slight palatalization (which seems to also include some velarisation). The effects of coarticulation can be seen in Figure 12 and Figure 13 below. For the clear liquids (Figure 12) the tongue root is a little more advanced and there is some independent extra raising of the tongue into the palatal arch, particularly for /l/ and /ɾ/. The 5th liquid may perhaps become more apical and alveolar in conjunction with its slight palatalization (which seems to also include some velarisation).

Figure 12 – Mean (thick lines) and standard deviation (thin) of target /ʐ/, tap /ɾ/ and lateral /l/ in /a\_\_a/ Figure 12 – Mean (thick lines) and standard deviation (thin) of target /ʐ/, tap /ɾ/ and lateral /l/ in /a\_\_a/ context (solid lines) vs. mean targets from mixed /i\_\_a/ and /a\_\_i/ contexts (dashed).

context (solid lines) vs. mean targets from mixed /i\_\_a/ and /a\_\_i/ contexts (dashed).

The two dark liquids are shown in Figure 13. Again, the tongue root is a little more advanced and there is a little extra raising for /ɭ/, but there is little change in /r/, which is the most stable of the consonants, as shown above in (Figure 11). The root of the dark

72

Th e two dark liquids are shown in Figure 13. Again, the tongue root is a little more advanced and there is a little extra raising for /ɭ/, but there is little change in /r/, which is the most stable of the consonants, as shown above in (Figure 11). Th e root of the dark consonants, in its advanced state due to the infl uence of /i/, is still more retracted than the root in the clear consonants.

Figure 13 – Mean (thick lines) and standard deviation (thin lines) of target for trill /r/ and retroflex lateral /ɭ/ in /a\_\_a/ context (solid lines) vs. mean from pooled /i\_\_a/ and /a\_\_i/ contexts (dashed). Figure 13 – Mean (thick lines) and standard deviation (thin lines) of target for trill /r/ and retrofl ex lateral /ɭ/ in /a\_\_a/ context (solid lines) vs. mean from pooled /i\_\_a/ and /a\_\_i/ contexts (dashed).

#### 4.4 Summary

**5. Previous articulatory research on Tamil and Kannada**

We have presented a qualitative analysis of our small set of articulatory data, which supports the clear/dark distinctions of Punnoose (2011) and Punnoose et al (2012). We confirm the nature of the distinction between the tap (which is sometimes rather stop-like) and trill (which can undershoot), and between the alveolar and retroflex lateral (which appears flap-like). We also notice unusual activity in the production of the 5th liquid (which otherwise appears to be a post-alveolar approximant cum fricative), the nature of which we have not seen in our previous ultrasound investigations. We have presented a qualitative analysis of our small set of articulatory data, which supports the clear/dark distinctions of Punnoose (2011) and Punnoose et al. (2012). We confi rm the nature of the distinction between the tap (which is sometimes rather stop-like) and trill (which can undershoot), and between the alveolar and retrofl ex lateral (which appears fl ap-like). We also notice unusual activity in the production of the 5th liquid (which otherwise appears to be a post-alveolar approximant cum fricative), the nature of which we have not seen in our previous ultrasound investigations.

Coarticulation to /i/ was found. Impressionistically we noted the appearance of audible and spectrally-visible frication in the liquid when /i/ preceded it, though no difference in tongue shape was apparent (but beware that the sample was very small). The dark trill was

McDonough & Johnson (1997) is a single-speaker study which makes a useful point of reference for our work here. They examined the five liquids in the Brahmin dialect of Tamil. This related language also has two rhotics (an alveolar and a retroflex flap); two laterals (an alveolar and a retroflex); and a 5th liquid. The aim of their small-scale study was to investigate the articulatory, acoustic and perceptual characteristics of the five Tamil liquids, in particular the 5th one, in [VCV] contexts. Via electropalatography (EPG) and static palatography, the articulation of the Tamil 5th liquid was shown to involve tongue contact on the hard palate, as is typical for retroflex sounds. However, unlike /ɽ/ there was no evidence of any forward motion during the consonant closure, and unlike /ɭ/ there was no opening at the rear lateral

A difference was also found between /ɽ/ and /ɭ/ vs. /ʐ/, in that the linguogram and EPG data showed a mid-sagittal gap in contact between the tongue and the palate, suggesting a dip behind the main constriction of /ʐ/. Taken with their acoustic results, McDonough & Johnson

73

a more braced tongue root and dorsum to support their anterior articulations (see below).

edge of the palate in the EPG data.

*5.1 The 5th liquid* 

*4.4 Summary*

Coarticulation to /i/ was found. Impressionistically we noted the appearance of audible and spectrally-visible frication in the liquid when /i/ preceded it, though no difference in tongue shape was apparent (but beware that the sample was very small). The dark trill was particularly resistant to coarticulation, and it may be that both it and the retroflex lateral have a more braced tongue root and dorsum to support their anterior articulations (see below).

#### 5. Previous articulatory research on Tamil and Kannada

#### 5.1 The 5th liquid

McDonough & Johnson (1997) is a single-speaker study which makes a useful point of reference for our work here. They examined the five liquids in the Brahmin dialect of Tamil. This related language also has two rhotics (an alveolar and a retroflex flap); two laterals (an alveolar and a retroflex); and a 5th liquid. The aim of their small-scale study was to investigate the articulatory, acoustic and perceptual characteristics of the five Tamil liquids, in particular the 5th one, in [VCV] contexts. Via electropalatography (EPG) and static palatography, the articulation of the Tamil 5th liquid was shown to involve tongue contact on the hard palate, as is typical for retroflex sounds. However, unlike /ɽ/ there was no evidence of any forward motion during the consonant closure, and unlike /ɭ/ there was no opening at the rear lateral edge of the palate in the EPG data.

A difference was also found between /ɽ/ and /ɭ/ vs. /ʐ/, in that the linguogram and EPG data showed a mid-sagittal gap in contact between the tongue and the palate, suggesting a dip behind the main constriction of /ʐ/. Taken with their acoustic results, McDonough & Johnson describe this segment as being "an apical retroflex central approximant with static articulation, no laterality and only incidental frication" (McDonough & Johnson 1997: 22).

In our Malayalam speaker, the 5th liquid is certainly not static in its active articulator, though we cannot tell how the contact patterns change, nor whether there is any laterality. There appears, for our speaker, to be more friction, and the constriction appears laminal.

Narayanan et al. (1999) also studied sustained productions of each of the Tamil liquids in a word-final /paC/ context, though the methodologies were rather different. The first author produced each liquid, and was recorded using static palatography, magnetic resonance imaging (MRI), and electromagnetic magnetometry (EMMA). They found the 5th liquid (in their transcription system it was /ɻ/) involved an anterior tongue body articulation with the narrowest constriction in the palatal region, though the exact location varied in an inconsistent way. The 5th liquid's back cavity was larger than the other liquids in Tamil (due to root advancement) and was 'depressed' around its retroflex constriction, as had also found by McDonough & Johnson (1997). However, unlike McDonough & Johnson's findings, both the 5th liquid and the retroflex lateral had a dynamic circular kinematic character, which is more similar to what was seen here for the 5th liquid and retroflex lateral in our Malayalam speaker, whose movement was, in addition, more complex than a simple circular movement.

Taking articulatory and acoustic data together, both McDonough & Johnson (1997) and Narayanan et al. (1999) suggest that the 5th liquid in Tamil is a central retroflex approximant, therefore a third rhotic. However, their description of the type of retroflex articulation (static vs. back-to-front) reveals contradictory findings with each other (and with our observations). Furthermore, their findings reveal perceptual and spectral similarity between this sound and the retroflex lateral, which might explain some of the controversy surrounding the identity of the 5th liquid, although acoustic evidence was presented to argue for a classification of the Tamil 5th liquid as a rhotic. We have only discussed the acoustics of the speaker here briefly, but Punnoose (2011) and Punnose et al. (2012) explore Malayalam acoustic patterns in detail, looking at the first four formants of the liquids, along with their phonotactics, and conclude that both resonance and primary liquid manner may be equally relevant in understanding the system and placing the 5th liquid in it, a conclusion which suggests that further comparative research on these related languages would be highly desirable, not least because it may be unlikely that we find the same pattern in each.

#### 5.2 Retroflexion and tongue-root stability

Such broader cross-linguistic work is under way. Recently, Kochetov et al. (2012) looked at (geminate) obstruents in Kannada and found that the retroflex stop in an /a\_\_a/ context had a fronted tongue root compared to other geminate obstruents. It is not clear yet whether this is characteristic of any other retroflexes in Kannada, or how Kannada liquids behave, just as it is not clear what happens in Malayalam retroflex obstruents. From Kochetov et al.'s figures, we estimate that the root moves forward in Kannada from a neutral or /a/-like position ~300 ms before the centre of the voiceless retroflex stop by about 5 mm-10 mm. Contrast this with the highly stable (retracted) root in the Malayalam retroflex lateral above (Figure 9) in the same vowel context. In Malayalam there is root movement in a *front* vowel context – indeed the extent of movement in the retrofl ex lateral fl ap in an /a\_\_i/ context seems comparable to the Kannada /a\_\_a/ context, both from /a/ forwards to /ɭ/ (the shape charted in Figure 13) and then again to /i/ (Figure 14). Th e location of the forwardmoving constriction has, however, the same anterior place of articulation as it does in /aɭa/. Kochetov notes that earlier articulatory research on Tamil (including Narayanan et al. 1999) had also found that the pharyngeal cavity was wider in retrofl exes than in dentals (and the neutral position).

Figure 14 – /aɭi/ from the start of the word (0 ms) with 10 ms tracings and thicker lines representing the shape at the time of the most retracted blade (130 ms), at the time of an acoustic flap event at the transition between /ɭ/ and /i/ (220 ms) and when the target for /i/ was reached during the 2nd vowel (300 ms). Figure 14 – /aɭi/ from the start of the word (0 ms) with 10 ms tracings and thicker lines representing the shape at the time of the most retracted blade (130 ms), at the time of an acoustic fl ap event at the transition between /ɭ/ and /i/ (220 ms) and when the target for /i/ was reached during the 2nd vowel (300 ms).

Since the root in Malayalam liquids seems to have a different target in clear vs. dark resonant liquids and, to a lesser extent, varies due to coarticulation with an adjacent vowel, we might expect Kannada and Tamil liquids to be similar, in a way that would be predictable from their acoustic resonance. Since the root in Malayalam liquids seems to have a diff erent target in clear vs. dark resonant liquids and, to a lesser extent, varies due to coarticulation with an adjacent vowel, we might expect Kannada and Tamil liquids to be similar, in a way that would be predictable from their acoustic resonance.

lateral as a flap rather than as a plain approximant.

This was a single-speaker study, with the modest aim of providing preliminary results which

/a\_\_a/ context, as well as with those of Narayannan et al. (1999) for Tamil in terms of showing a rather rhotic lingual articulation for the 5th liquid. The high speed ultrasound data has also revealed better than other articulatory techniques some of the coarticulatory and dynamic complexity of these sounds. Of course, we need to look at other speakers, materials, and styles, to get an idea of how articulation can vary, before we can conclude what the *key* elements of the system are. For example, for a proper phonological analysis we need to know if certain speaker groups rely on the clear/dark resonance difference as much as, or more than, differences in manner or primary place, and in what sorts of tasks and contexts. Moreover, we would need to check whether the presence of the ultrasound headset and probe, or just the recording set-up, might have affected any of the articulations here – this speaker seemed to have quite a lot of frication and to produce his taps more as short stops, and the retroflex

One general question for future work relates to the extremely stable nature of the root in the trill and retroflex flap in the /a/ context (and their reduced but still evident stability elsewhere, i.e. in /aCi/ and /iCa/ contexts). Is this intrinsic stabilization of the back of the tongue in liquids of this type indicative of a very high coarticulatory resistance (Recasens & Pallarés 1999, Zharkova & Hewlett 2009), perhaps because bracing is required to facilitate trilling or other complex anterior articulation (Narayanan et al. 1999)? If Recasens & Pallarés (1999) found high stability for the Catalan trill, using acoustic and electropalatography (EPG) data, then our results might confirm their comment that *unlike* the tap (our emphasis), the Catalan trill "involves a high degree of tongue body constraint" (ibid:163). On the other hand, the

75

**6. Conclusions**

#### 6. Conclusions

This was a single-speaker study, with the modest aim of providing preliminary results which are likely mainly to raise questions and hypotheses for future research; but the results here do seem to fit well with the findings of Punnoose (2011), at least for the liquids in an intervocalic /a\_\_a/ context, as well as with those of Narayannan et al. (1999) for Tamil in terms of showing a rather rhotic lingual articulation for the 5th liquid. The high speed ultrasound data has also revealed better than other articulatory techniques some of the coarticulatory and dynamic complexity of these sounds. Of course, we need to look at other speakers, materials, and styles, to get an idea of how articulation can vary, before we can conclude what the *key* elements of the system are. For example, for a proper phonological analysis we need to know if certain speaker groups rely on the clear/dark resonance difference as much as, or more than, differences in manner or primary place, and in what sorts of tasks and contexts. Moreover, we would need to check whether the presence of the ultrasound headset and probe, or just the recording set-up, might have affected any of the articulations here – this speaker seemed to have quite a lot of frication and to produce his taps more as short stops, and the retroflex lateral as a flap rather than as a plain approximant.

One general question for future work relates to the extremely stable nature of the root in the trill and retroflex flap in the /a/ context (and their reduced but still evident stability elsewhere, i.e. in /aCi/ and /iCa/ contexts). Is this intrinsic stabilization of the back of the tongue in liquids of this type indicative of a very high coarticulatory resistance (Recasens & Pallarés 1999; Zharkova & Hewlett 2009), perhaps because bracing is required to facilitate trilling or other complex anterior articulation (Narayanan et al. 1999). If Recasens & Pallarés (1999) found high stability for the Catalan trill, using acoustic and electropalatography (EPG) data, then our results might confirm their comment that *unlike* the tap (our emphasis), the Catalan trill "involves a high degree of tongue body constraint" (ibid:163). On the other hand, the stability or not in this /a\_\_a/ context may be a reflection of the different clear/dark resonances of these liquids in Malayalam.

As a reviewer pointed out to us, these need not be antagonistic or independent goals. It may be the case that the characteristic dark resonances are the acoustic signature of a retracted and stabilized tongue body, a point argued previously on the basis of data on trills in Russian (Kavitskaya et al. 2009; Proctor 2011) and Spanish (Proctor 2011). The coarticulatory pressure from a nearby vowel, /a/, the language-specific characteristic of darkness, and the tendency for a stabilized root in the trill and retroflex flap may all come together to create the immobility seen in Figures 6 & 9. More work needs to be done on the effect of other vowel contexts on the Malayalam liquids to work out the relative importance of these factors.

From the existing ultrasound research on 4 speakers of Spanish (Proctor 2011), we can surmise that the trill and tap appear *alike* in showing little root movement in the /a\_\_a/ context, compared to /d/ and /l/, but it is hard to be sure from Proctor's data's narrower field of view, in which less of the root is imaged. As noted above, our Malayalam data shows a clear dynamic difference between the dark trill and clear tap in this vowel context. Proctor focuses on the 'dorsum' or 'body' in the liquids, noting "during the production of the trill, in contrast to the obstruent, the tongue body moves up and forward – away from the articulatory target of the context vowel – which suggests that this movement is intrinsic [to /r/]" (ibid:457). He also states that dorsal advancement is seen in the tap and lateral. In an /e\_\_e/ context, however, the lateral and tap are stable, while the dorsum retracts as the tongue moves from the preceding vowel into the trill. The location of this intrinsic dorsal target varies from liquid to liquid in Spanish: the lateral has an advanced dorsum, the trill a retracted one, and the tap is intermediate. We saw above (Figure 11) that the tongue root and dorsum in Malayalam also move, in the release of the trill towards a following front vowel. The targets for the liquids, as in Spanish, show coarticulation from the flanking vowel (Figures 12, 13). The root is more advanced next to /i/ in all five liquids, but while the dorsum is raised in the three clear ones and in the dark retroflex lateral tap, it is not raised in the trill. The trill is therefore the most stable liquid, and in fact this is comparable to Spanish when considering *just* /e/ and /a/ contexts: coarticulation in the trill is only evident once /u/ is taken into account, which we cannot do here. Proctor's quantification of coarticulation suggests all three liquids' dorsal constrictions vary to similar degrees, something else we cannot examine in our own limited data.

Russian is interestingly different to Malayalam since it has contrastive clear / dark resonances. Briefly, Proctor's (2011) study of 4 speakers found the dark (non-palatalised) trill and lateral were more dorsally stable across different vowel contexts than a non-palatalised /d/, indicating greater stability in that area. Clear liquids also showed more dorsal stability than the comparable clear obstruent. In sum: the clear liquids (and obstruent) had a palatal ('anterior dorsal') target; the non-palatalised /l/ had a uvular-pharyngeal target; and the non-palatalised trill was rather intermediate with a backish target.

In general, if we could get more articulatory data to augment the mid-sagittal view, it would be particularly useful for understanding the laterals and the

5th liquid. In the AAA multi-channel system it would be easy to add synchronized lip video (60 fps) and EPG (200 fps), which can be captured simultaneously with hs-UTI. EPG gives excellent spatio-temporal information on anterior tongue-palate contact for taps and trills, as shown by Recasens & Pallarés (1999), and centre-only contact to indicate the presence of laterals (cf. Scobbie & Pouplier 2010 for English), and both these studies also show how EPG can be used to detect secondary articulation.

Additionally, it would be very useful to get some coronal section data from ultrasound, in an attempt to understand how the tongue surface of the tip and blade deforms, stretches and moves to enable lateral airstream(s) and hence alter the resonance characteristics of the complex oral cavity tube(s). However, as with the data here, the sublingual cavity would prevent us gaining a complete view of the tongue tip, and coronal scans cannot be made simultaneously with the mid-sagittal scans with current equipment. A high-speed MRI system might also be able to capture the relevant articulation.

We think such extra information would be particularly useful for understanding Malayalam's 5th liquid. The tongue surface data we have seen, augmented by our visual inspection of tongue-internal features, suggests that there are volumetric changes in the tongue blade that the mid-sagittal curves simply do not capture well. The tip seems to extend forward during the following vowel, suggesting it has been retracted during the liquid. While a flesh-point tracking technique like EMA (electromagnetic articulography) would be very useful in showing how the blade and tip upper surface might be extending, it would be harder to get dynamic data on how the tip might be thinning or lowering laterally, without, that is, the challenge of fixing a coil to the sensitive sublingual surface. Finally, it may be the case that a categorical analysis of the 5th liquid as either a rhotic or lateral liquid is not desirable, phonologically. It is, after all, an ambiguous segment. Its phonetic characteristics would not seem to be predicted by a simple formal phonological classification of this segment as being 'rhotic', or not. We do of course need more articulatory data. If this 5th liquid does indeed have the unusual kinematic properties which are suggested here, i.e. a movement path which we have not observed in typical rhotics or laterals, this would perhaps help to explain its ambiguous status in Malayalam's large liquid set.

#### Acknowledgements

We would like to acknowledge technical help in recordings from Steve Cowen. For both the ultrasound data capture set-up and continuing tweaks to the AAA analysis software we extend our thanks to Alan Wrench. Special thanks are due to our speaker for his generous and gracious participation. Finally an appreciative thank you to the organizers of the *'r-atics-3* conference and editors of this volume for their inspirational energy and enthusiasm, and to two anonymous reviewers for their insightful comments.

#### References


## The many faces of /r/

Mary Baltazani1 & Katerina Nicolaidis2 1 University of Ioannina

2 Aristotle University of Thessaloniki

#### Abstract

Acoustic and articulatory (EPG) examination of the Greek rhotic in several prosodic positions (singleton phrase initially, word initially and word medially, also in /Cr1 / clusters and /rC/ sequences) revealed a single constriction of short duration suggesting a tap articulation. This contained a vocalic part in /Cr/ and /rC/ contexts, but interestingly, also in phrase initial position when the rhotic was followed by a vowel. The constriction phase had a fairly stable duration and was shorter than the vocalic part, whose duration depended on prosodic position and context: it was longest phrase initially, next longest in /rC/ sequences and shortest in /Cr/ clusters. Finally, the vocalic interval's formant structure was typically similar to that of the nuclear vowel, but with more centralized formant values. We hypothesize a vocalic gesture upon which the rhotic is superimposed. Articulatorily, the place and degree of constriction of the tap varied as a function of prosodic position, context and speaker.

#### 1. Introduction

The phonetic variability of rhotics across and within languages has been noted repeatedly (e.g. Lindau 1985; Ladefoged & Maddieson 1996; Catford 2001). This variability in realization has been the sole subject of the *r-atics* conference, now in its 3rd occurrence, contributing a large body of evidence on the many faces of /r/ in language after language (e.g. Demolin 2001; Docherty & Foulkes 2001, among others). Apart from sociolinguistic context, variation has been reported, from a more phonetic viewpoint, as a function of phonetic context, prosodic position and speech rate (Lindau 1985; Inouye 1995; Recasens & Espinosa 2007). This paper follows the phonetic-oriented rather the sociolinguistic methodology in reporting on the Greek rhotic variability as spoken in Standard Modern Greek.

<sup>1</sup> Throughout the paper, the symbol /r/ is used for the Greek rhotic for practical reasons.

In the Greek literature, recent laboratory studies describe the rhotic as a tap in intervocalic position (Nicolaidis 2001; Baltazani 2005, 2009) or in initial and intervocalic position (Arvaniti 1999). All studies have reported considerable variability in its acoustic and articulatory characteristics. Both a tap and an approximant realization have been observed (Nicolaidis 2001; Baltazani 2005, 2009) and its place of articulation has been reported to vary across alveolar, retracted alveolar, and postalveolar positions (Nicolaidis 2001).

The most recent studies have documented the presence of a vocoid between the rhotic and the consonant in /Cr/ clusters and /rC/ sequences, and more interestingly, in phrase initial position when /r/ is followed by a vowel (Nicolaidis & Baltazani 2011, 2013; Baltazani & Nicolaidis 2013, collectively referred to henceforth as N&B). While no other study, to our knowledge, has reported a vocoid accompanying a singleton /r/ in phrase initial position in other languages, several studies have detected a vocoid in /Cr/ clusters and /rC/ sequences, in Catalan, several Spanish dialects, in Romanian, and Hungarian (e.g. Bradley & Schmeiser 2003; Bradley 2004; Recasens & Espinosa 2007; Vago & Gósy 2007; Savu 2013).

Arvaniti (2007) claims that this more complex articulation of /r/ in Greek indicates trill production in clusters while Baltazani (2005, 2009) interprets it as a tap with a vowel-like transition. The electropalatographic data reported in N&B typically show one constriction present, providing evidence of a tap articulation.

There are two types of cross-linguistic accounts for /r/, especially in consonant clusters. Both assume that the vocoid is part of the nuclear vowel which underlies the whole syllable and is briefly exposed between the consonants: one accounts for this as the result of gestural overlap between the two consonantal gestures (Romero 1996; Bradley 2004; Recasens & Espinosa 2007) and the other, in a slightly different vein, hypothesizes that the unmasking is due to the tongue movement trajectory of the tap, which cocks back to gain momentum before tapping (Inouye 1995). On the other hand, Blecua (2001) argues that the vocoid is an inherent part of the rhotic based on the observation that the formant structure of the vocoid is similar but not identical to that of the tautosyllabic vowel.

The former two types of account mentioned above are not supported by our results which document a vocoid even for /r/ in vocalic environments, e.g. in phrase initial position (##rV). Instead, in line with literature on coarticulation (e.g. Öhman 1966), we hypothesize that the vocalic gesture is an integral part of the rhotic upon which the tap constriction is superimposed.

This study compares the rhotic acoustic and articulatory realization across positions explored in previous studies in N&B. It attempts a synthesis of the previous results, offering a unified interpretation of the Greek rhotic production on the basis of an analysis that studies the rhotic across several prosodic positions using a consistent experimental and methodological design. It addresses two main issues: first, the effect of context and prosodic position on the rhotic duration, articulation and the vocoid formant structure; second, on a more theoretical level, an explanation for the vocoid based on our empirical data.

#### 2. Our experimental data

#### 2.1 Method

The N&B experiments examined /r/ in real words, where possible, in the environment of all five Greek vowels, /i, e, a, o, u/2 ; test words were up to four syllables long. Five adult speakers, AT, TP (male) and MM, KN, RP (female), repeated the material five times at a comfortable speaking rate. Apart from phrase initial position, where the test word was uttered in isolation, test words were embedded in the carrier phrase [i 'leksi 'ine \_\_\_a'pli] 'The word \_ is simple'. We examined /r/ in five positions: phrase initial (/##rV/), word initial within a phrase (/i#rV/, henceforth 'word initial'), word-internal intervocalic (/arV/, 'intervocalic'), in /Cr/ clusters and in /rC/ sequences (henceforth 'C-contexts' will refer to both /Cr/ and /rC/ unless only one of them is discussed). C-contexts contained symmetrical VCrV and VrCV sequences, with C = /p, t, k, f, θ, x/. In singleton /r/ conditions the /rV/ syllable was stressed but C-contexts words had variable stress. The cross experiment total was 1875 tokens.

In all experiments we simultaneously collected acoustic and EPG data using the British EPG system marketed by Articulate Instruments. The artificial palate used in this system has 62 electrodes on its surface, which are distributed in eight rows. The front four correspond to the alveolar zone, which is further subdivided to the alveolar region (rows 1 to 2) and the postalveolar region (rows 3 to 4). The back four rows of electrodes correspond to the palatal zone (Recasens et al. 1993). In addition, a separate recording of acoustic data was made on a digital recorder (Marantz PMD 660) with a Røde NT1-A cardiod condenser microphone. Acoustic data were analysed using PRAAT.

We measured the durations of the rhotic constriction phase and of the vocoid, as well as the F1 and F2 formants of the vocoid and of the flanking vowel(s) to detect possible environment influences on the vocoid. The onset of the constriction phase – together with the onset of the voicebar – was marked at the offset of silence, preceding vowel or vocoid, depending on prosodic position. The offset of constriction was marked at the beginning of the formants for the following vowel or vocoid. The beginning and end of the vocoid was marked at the onset and offset

<sup>2</sup> For a description of the Greek vowels, see Arvaniti (1999, 2007).

of its formant structure respectively (see e.g. Figures 8 and 10). The duration and formant measurements were automatically obtained through a PRAAT script. For the articulatory analysis, the first EPG frame of maximum contact/constriction in the four front rows was annotated (Figure 1a, b) as constriction always occurred in the alveolar zone. The frame of maximum contact typically coincided with the frame of maximum constriction; in the few instances that it did not, the frame of maximum constriction was annotated. The percentage frequency of electrode activation of the entire palate, i.e. all eight rows, over five repetitions was then calculated at the frame of maximum contact/constriction for the rhotic in each test word.

#### 2.2 Results

#### 2.2.1 Articulatory results

The articulatory analysis showed that the Greek rhotic is produced with a single constriction of short duration, both in C-contexts and in singleton /r/ positions suggesting a tap articulation (Figures 1a, b, 8). Some tokens involving trill production were found but they were very few across contexts/positions (for details see N&B).

Figure 1(a, b) – Acoustic and electropalatographic data for the rhotic in [le'pres] above and ['fortos] below (speaker TP). The annotation line corresponds to the first frame of maximum contact/constriction in the alveolar zone and the corresponding palatogram is shown at the top right of the display. A single tap gesture is evident for the rhotic in both tokens (see also palatograms and contact totals displays below the spectrograms).

However, variability in the articulation of the tap was evident across the data as there were tokens with complete constriction and tokens with incomplete constriction. The latter ranged from very constricted to very open articulations. These patterns related to variability in the acoustic signal. For tokens with complete constriction, there was evidence of a stop-like pattern frequently with a burst present (Figure 2a). For tokens with incomplete constriction, undershoot was manifested variously: a stop-like pattern but with no abrupt discontinuity at release, i.e. no burst (Figure 2b), noise/breathiness during constriction (Figure 2c), or formant structure indicating approximant production of the rhotic (Figure 2d).

b.

Figure 2(a-d) – Differences in the degree of constriction of the rhotic. Complete constriction in [ma'rika] (a) and incomplete constriction in [ma'ruli], ['rama] and [ma'rika] (b, c, d), and variation in the acoustic signal (see text for details).

The degree of constriction was influenced by several factors. First, an effect of singleton vs. cluster/sequence production was found, as most tokens with incomplete constriction were found for singleton /r/ (63%, 236 out of 375). Second, there were more tokens with incomplete constriction in heterosyllabic /rC/ (57%, 426 out of 749) than tautosyllabic /Cr/ contexts (47%, 351 out of 748).

Third, for singleton /r/, prosodic position had an effect on degree of constriction. More tokens with incomplete constriction were present for word initial position, i.e. 78% in comparison to 57% for phrase initial and 54% for word medial (Table 1, see also Figure 6).

Finally, for /r/ in C-contexts, overall more tokens were produced with incomplete constriction in the context of a fricative compared to a stop, i.e. /fricative-r/ 49% and /r-fricative/ 67% compared to /stop-r/ 44% and /r-stop/ 47% (see Figure 5; note speaker variation in Table 1).

Table 1 presents the numbers of tokens produced with incomplete constriction for singleton /r/ and C-contexts for all speakers. In addition to the variation noted above, large speaker variability is evident. For instance, for speakers KN, AT and RP more productions involved incomplete constriction systematically across conditions compared to MM and TP. This suggests different speaker strategies in rhotic production.


Table 1 – Number of tokens showing incomplete constriction for singleton /r/ and C-contexts.

These values should be considered with caution, as it is possible that complete contact may have not been registered for some tokens due to the sampling rate of the EPG system (10 ms). Observation of the EPG and acoustic data indicates that, if this has occurred, it involves a very limited portion of the data (tokens involving very constricted productions) as there were clear differences in the acoustic waveform among tokens produced with complete and incomplete constriction. Further analysis can estimate such cases more precisely. Such a shortcoming is expected to affect /r/ production to a similar degree in all contexts, as it is random. Thus although it may result in less accurate absolute values, it is not expected to affect the accuracy of the differences reported across conditions.

With reference to the place of articulation of the rhotic, the constriction location in the alveolar zone, i.e. the four front rows of electrodes, was found to vary as a function of context, prosodic position and speaker.

Figure 3 illustrates the influence of the vocalic context on the place of articulation of the rhotic. Overall, more advanced production was evident in the front vowel contexts /i, e/. More retracted articulation was generally present in the rest of the contexts with several tokens showing greatest retraction in the context of /a/ and/or /o/. The data showed therefore articulation in the alveolar zone but the precise place of rhotic articulation varies from alveolar, retracted alveolar, advanced postalveolar to postalveolar depending on the vocalic context.

Figure 3 – EPG palatograms displaying percentage frequency of electrode activation over five repetitions during the production of the /r/ in word-internal intervocalic position (top) by speaker AT, in /Cr/ clusters (middle) and /rC/ sequences (bottom) by speaker MM.

The consonantal context also influenced /r/ production in C-contexts. More fronted production was overall evident in the context of the dentals (Figure 4).



Figure 4 – Production of /r/ in /θr/and /xr/ clusters by speaker RP.

As noted above, context also had an effect on the degree of /r/ constriction in C-contexts. Overall, more open articulations were present in the context of fricatives (Figure 5).


Figure 5 – Production of /r/ in /rt/and /rθ/ clusters by speaker KN.

Figure 6 illustrates variation in the degree of constriction for singleton /r/ in phrase initial and word initial position. As noted previously, more open productions were evident in the latter position.


Figure 6 – Production of /r/ in phrase initial (top) and word initial position (bottom) by speaker KN.

Finally, the speaker was an important source of variation. Inter- and intra-speaker differences in degree of contact, place of articulation and degree of constriction were found. Figure 7 illustrates such differences: /r/ is produced at a more retracted place of articulation, with more instances of incomplete constriction and greater amount of contact in the palatal zone by speaker RP compared to TP.


Figure 7 – Production of /r/ intervocalically by speakers RP (top) and TP (bottom).

#### 2.2.2 Acoustic results

In C-contexts, but more importantly phrase initially where /r/ is not part of a cluster, the rhotic structure typically involves a vocoid (Figure 8).

Figure 8 – Phrase initial tap in ['rama]. Notice the long vocoid duration (60 ms).

Th e vocoid was clearly evident in phrase initial position where it had the longest duration, while in word initial and intervocalic positions, it was not as easy to discern due to the fl anking vowel environment. Th us in these last two positions no measurements were made. However, acoustic evidence, like discontinuities and/or an abrupt change in amplitude and formants during the pre-rhotic vowel (V1) (Figure 9), suggest the presence of a vocoid adjacent to V1 and to some degree overlapping with it even in these positions (cf. Savu 2013 for similar evidence in /VrV/ contexts in Romanian; Willis 2006 for another interpretation of these acoustic characteristics). For more details see Baltazani & Nicolaidis (2013). A possible alternative interpretation may account for such acoustic manifestations during the vowel as solely resulting from coarticulatory infl uence from the fl anking vowel. Still it is interesting to note that there are frequently abrupt discontinuities present resulting in a vocalic interval that is relatively separate and of a remarkably similar duration to the vocoids found in other prosodic positions.

Figure 9 – Acoustic evidence for a vocoid in intervocalic position. The last 1/3 of V1 in [ma'rika] shows a discontinuity and a change in formants.

Th e acoustic measurements revealed variability in the vocoid production, which ranged from a modal vowel to a breathy/whispered one (top and bottom of Figure 10). Finally, there was a tendency for more tokens with whispered/ breathy vocoids or frication noise during the constriction phase in heterosyllabic /rC/ sequences than in tautosyllabic /Cr/ clusters. Th is suggests more assimilatory eff ects of the following voiceless obstruent in /rC/ sequences.

Figure 10 – Modal vowel quality for the vocoid in [a'frato] (top) and breathy vowel quality in ['erçete], (bottom).

A comparison across positions shows that the vocoid has longer average duration than the constriction (Figure 11). Similar results have been found for Spanish (Bradley & Schmeiser 2003). A comparison across positions shows that the vocoid has longer average duration than the constriction (Figure 11). Similar results have been found for Spanish (Bradley and Schmeiser 2003).

**Figure 10**. Modal vowel quality for the vocoid in /a'frato/ (top) and breathy vowel quality in ['erçete], bottom.

Figure 11 – Vocoid and constriction duration in different positions. Note that constriction duration was measured in all contexts, while the vocoid duration was measured in phrase initial and C-contexts only.

Furthermore, among the positions where the vocoid duration could be measured, shown in Figure 11, the longest occurred phrase initially, almost twice as long as that in /Cr/ clusters and considerably longer than that in /rC/ sequences. We attribute the long vocoid duration in phrase initial duration to the eff ect of initial strengthening. Diff erences in vocoid duration in C-contexts are attributed to diff erences in syllabic affi liation, as /Cr/ are tautosyllabic and /rC/ heterosyllabic and thus the spatio-temporal coordination of gestures may diff er in the latter (see also Recasens & Espinosa (2007) for a review of similar fi ndings for vocoid duration in /Cr/ and /rC/ contexts in Spanish and Catalan). Th e consonantal constriction was longest for word initial and /rC/ sequences and shorter for phrase initial, intervocalic and /Cr/ clusters (Figure 11). Note, furthermore, that, unlike the vocoid, the diff erences in the duration of the constriction across prosodic positions are small, with only 8.5 ms diff erence between the average longest and shortest duration.

Th ese comparisons indicate that the diff erent positions/contexts exert an asymmetric infl uence on the two components of the rhotic. One possible reason for such asymmetries relates to their articulatory nature. Th e tap, which has been described as a short ballistic gesture in the literature (Lindau 1985; Ladefoged & Maddieson 1996; Recasens & Espinosa 2007), is not as free to lengthen as the vocoid.

A comparison of the vocoid quality in the singleton vs. C-contexts revealed that across prosodic positions the vocoid formants (measured in Hz) are similar to those of the nuclear tautosyllabic vowel and somewhat more centralized

(Figure 12). Th e amount of centralization varied across prosodic positions, vowels and gender. In phrase initial position female speakers showed a smaller degree of centralization than males, while in /Cr/ clusters the opposite trend was observed. In /rC/ sequences, on the other hand, the amount of centralization was relatively similar across genders. On the whole, centralization was more pronounced across genders and vowels for /rC/ sequences, probably because /rC/ sequences are heterosyllabic.

Figure 12 – Comparison of vocoid formants (in Hz) to the nuclear V in different contexts for male (left panels) and female speakers (right).

Figure 13 shows considerable variability in the Euclidean distance between the vocoid and the nuclear vowel across speakers and vowels. On average, across vocalic environments, the vocoid in /Cr/ clusters has the closest formant values to the nuclear vowel.

Figure 13 – The Euclidean distance between the vocoid and the nuclear vowel across speakers and vocalic contexts.

#### 3. Discussion

Across contexts, articulation of the rhotic typically involved one short constriction (ranging between 11-57 ms) suggesting a tap articulation. Similar durations for taps have been reported for several languages previously (see Recasens & Espinosa 2007 and references therein). While single contact trills have also been reported before, in the case of utterance initial position they typically involve a much longer constriction phase (around 100 ms) than the one reported in the present study (see Recasens & Espinosa 2007). An interesting fi nding of the research reported in this study, is the presence of a vocoid during rhotic production in diff erent contexts. On the basis of this fi nding, two main questions addressed in this paper are: "is there a vocoid present in all diff erent contexts?" and "why is there a vocoid?". While the presence of the vocoid has been documented in C-contexts before, useful insights can be gained from the study of phrase-initial /r/ where /r/ articulation is not aff ected by an adjacent consonant. Establishing the existence of a vocoid before the constriction phase corroborates the view of the vocoid as an essential articulatory component of the rhotic in singleton contexts. If phrase initial /r/ is the 'canonical' production, then the vocoid can be explained along the same articulatory principles in other contexts. Th e evidence above, together with the indications provided for a vocoid in word initial and intervocalic position manifested through abrupt discontinuities in amplitude and formant structure, cast doubt on an exclusively gestural overlap account (see section 1) since a vocoid is attested even without another consonant adjacent to the /r/. Instead, we propose that the rhotic is superimposed on a rhoticspecifi c vocalic gesture, which is necessary for the execution of the ballistic gesture (cf. Blecua 2001), i.e. the brevity/ballistic nature of the tap gesture requires an undelying vocalic gesture for its execution. Coarticulatory eff ects are expected in diff erent contexts, which can account for the spatial and temporal variability present during the vocoid and constriction phases of the rhotic.

Further corroborating evidence for our proposal can be found in word-fi nal /r/ which is produced with a vocoid after the constriction (Stolarski 2011 for Polish; Recasens & Espinosa 2007; but see Romero 2008 for a gestural coordination account). Figure 14 shows at the top panel the palindromic word [re'ver], as produced in the phrase [re'ver mu] "my cuff s" by the second author. Th e word-fi nal vocoid is clear before the segment /m/ (the initial vocoid and constriction duration is 48 ms and 19 ms respectively; the fi nal vocoid and constriction duration is 26 ms and 23 ms respectively). Th e bottom panel in Figure 14 shows /r/ produced in isolation with one vocoid on either side of the constriction; the initial and fi nal vocoid duration is 36 ms and 54 ms respectively while the constriction itself, which ends with a burst followed by frication noise, lasts 28 ms (cf. Stolarski 2011 for Polish /CrC/ clusters).

Figure 14 – Top: mirror images of vocoid+constriction in [re'ver]. Bottom: /r/ in isolation.

In line with the above interpretation, i.e. that the rhotic is superimposed on a rhotic-specific vocalic gesture, the variation observed in the position of the vocoid in relation to the consonantal context in /Cr/ vs. /rC/ sequences is expected and can be uniformly explained. If the rhotic is superimposed on a vocalic gesture then the vocoid is expected to precede the rhotic constriction in /CrV/ sequences and follow it in /VrC/ contexts.

Furthermore, the formant structure of the vocoid was more centralised than the nuclear vowel, which was an expected outcome: the V-to-V gesture upon which the rhotic is superimposed includes a vocoid which is influenced through V-to-V coarticulation by the nuclear vowel in to different degrees depending on the context (singleton, C-context). The influence of the adjacent vowel, especially in C-contexts, has been documented for other languages as well (e.g. Blecua 2001; Ramírez 2006).

More specifically, there was a difference between the heterosyllabic /rC/ sequences and all the other prosodic positions: in /rC/ sequences, which lack syllable coherence, both the vocoid and the constriction are longer than in /Cr/ clusters and the vocoid formants are more centralized suggesting less temporal compression and reduced spatial V-to-V overlap. However, there is C-to-r anticipatory coarticulatory influence across the vocoid both in place and degree of constriction. Interestingly, despite the longer vocoid and constriction duration, there were more tokens with incomplete constriction than in /Cr/ clusters. More C-to-r anticipatory than carryover effects, i.e. more tokens with incomplete constriction in /rC/ than in /Cr/ contexts, may relate to the more centralized quality of the vocoid in /rC/ sequences.

On the other hand, the longest vocoid duration was observed for the singleton rhotic phrase initially and the shortest for /Cr/ clusters. These findings can be interpreted as initial strengthening for the rhotic in phrase initial position, realised temporally in the vocoid but not in the constriction duration. The shortest vocoid and short constriction duration were found in tautosyllabic /Cr/ clusters suggesting temporal compression due to the closer co-ordination relations. Carryover C-to-r effects were also found across the vocoid affecting both the place and degree of constriction of the rhotic.

Our data showed variation in place and degree of constriction, duration and vocoid formants as a function of speaker, context and prosodic position. In addition, the vocoid was typically longer than the constriction. While the vocoid length showed considerable variation as a function of prosodic position and context, smaller differences were found for the constriction, something we interpret as lack of freedom for lengthening the tap constriction.

Across experiments, more than 50% of the tokens were produced with incomplete constriction, ranging from very constricted to very open articulations. A smaller percentage of productions with incomplete constriction was found in C-contexts than for singleton /r/, which suggests influence from the consonantal context. Interestingly, more tokens with reduced contact were found in word initial and /rC/ sequences where the constriction is longer. For the former, this suggests that more factors, in addition to boundary strength, regulate the amount of contact. In particular, more tokens with incomplete constriction in word-initial than word-medial position may relate to contextual influence and related gestural coordination patterns, i.e. word-initial tokens were preceded by the high vowel /i/ of the word *'leksi'* in the carrier phrase while word-medial rhotics were preceded by the open vowel /a/. A more open tongue position during /a/ may allow for a more complete ballistic gesture reaching the target for the tap. Note that difficulty in attaining closure during taps in the environment of a following /i/ has been reported in Recasens & Espinosa (2007) due to the nature of the gestures involved. More investigation is necessary for a comprehensive account of spatio-temporal variation.

Finally, the results on the contextual influence, in particular, V-to-r and C-to-r effects, indicate that the tongue coarticulates with neighbouring gestures during the production of the rhotic in Greek, in line with evidence from other languages (e.g. Recasens 1991). While the analysis presented has aimed towards a uniform explanation of /r/ production, it should be noted that further work is needed so that current and alternative interpretations can be tested and firm conclusions can be drawn. This includes statistical analyses of the different measures across positions and further qualitative analyses. These are currently underway.

#### Acknowledgements

We thank the conference organizers for their generous hospitality, the conference audience for useful comments and two anonymous reviewers for their very helpful feedback.

#### References


## Another look at the structure of [ɾ]: Constricted intervals and vocalic elements

#### Carmen-Florina Savu, University of Bucharest

#### Abstract

This study investigates the hypothesis that the rhotic segment containing one constricted interval, [ɾ], has a more complex internal phonetic structure that includes vocalic elements flanking the constriction, as suggested in classic studies, as well as more recent ones (Stolarski 2011 and references cited therein). The current experiment focuses on the quality of the vocalic elements of the sound in Romanian (contexts #rV, Cr, rC) and the acoustic analysis shows them to systematically stay mid-high and central (to front) across contexts. The paper also briefly touches on a phonological implication of this structure of the tap.

#### 1. Previous studies: Putting contexts together

[ɾ] is described as the sound involving "a fast, ballistic tongue-tip raising movement and a single, short apicoalveolar contact" (Recasens & Espinosa 2007:1). When the segment is in intervocalic position (context VrV), this is seen as a very brief constricted interval on a spectrogram.

This paper argues for the claim that the tap actually contains two vocalic elements, one on each side of this constricted interval, as pointed out in classic studies by Polish authors and recently maintained in newer ones (see Stolarski 2011 and others this author cites). Thus, I aim to show that the tap's structure is actually 'vocoid-constriction-vocoid'.

Studies indicate that when [ɾ] is bordered by a consonant on one side, while having a vowel on the other side, spectrograms show a vowel-like element intervening between the consonant and the constricted interval of the tap. This means that a vocoid appears to the left of the constriction in Cr, and to the right of it in rC. The phenomenon is consistent cross-linguistically for clusters (V)CrV and VrC(V) (see Avram 1993; Ramírez 2006; Baltazani 2009, and others).

The appearance of the vocoids has also been reported where [ɾ] has a wordboundary (pause) on one of its sides instead of a consonant (see Vago & Gósy 2007, among others). The vocoid is positioned between the pause and the constriction of the tap, in word-initial and word-final /r/ (contexts #rV, Vr#). For example, a word that begins with [ɾ] actually begins with the vocoid (see Figure 3 below).

This vocalic element has received various interpretations. Ramírez (2006) labels it "epenthetic", though its systematic appearance across languages and contexts suggests that this is not the case.

Schmeiser (2009) prefers the term "intrusive vowel" because, from a synchronic point of view1 , this vocoid does not add an extra syllable to the word, which is what happens with vowel epenthesis. This may be another argument against considering the vocalic elements epenthetic. Bradley & Schmeiser (2003) explain the appearance of this "intrusive vowel" as the result of a less than maximal overlap between the two articulatory gestures performed to produce the tap and the adjacent consonant. While this explanation could account for Cr and rC clusters, it does not account for the #rV and Vr# cases, where there is no other consonant in the immediate vicinity of the tap. In these cases it would be difficult to consider the vocoid as an effect of the gestural transition from one consonant to the next.

Avram (1993) and Baltazani (2009) regard it as part of another realization of /r/, different from the intervocalic tap. Note that, under this view, we would be dealing with four realizations of the rhotic: one in intervocalic position, where it is just a short constriction, another in Cr and #rV, containing a constriction and a vocoid to its left, another realization for contexts rC and Vr# (a constriction and a vocoid to its right), and yet another for contexts Cr#, #rC and CrC (presented below), with two vocoids flanking a constriction.

None of these interpretations considers the vocalic element as part of the tap proper. In what follows I attempt to show that the vocalic element observed in Cr, rC, #rV and Vr# is one of the two vocalic parts a tap normally contains. Thus, I attempt to unify the contexts described above, with seemingly unrelated phenomena, and argue that, when considered globally, they lead to just one realization of the tap.

Slavic data indicate two vocoids, one on either side of the constriction when [ɾ] does not border with a vowel at all, but only consonants or pauses. We have the opportunity to see this in the rarer contexts #rC, CrC, Cr#, which I consider to

<sup>1</sup> An anonymous reviewer points out that diachronically, these vocoids may add syllables to the word or, on the contrary, the reverse may happen. Indeed, this is an interesting instance of reanalysis of (part of ) the vocoids as full vowels, or, in the reverse case, full vowels may be reanalyzed as parts of the tap. For reasons of space I do not elaborate on this topic here, but the interested reader may consult Savu (2012).

be the most important piece of the puzzle. The two vocoids appear in syllabic /r/ in Serbo-Croatian and Slovak (see Gudurić & Petrović 2005 and Pavlík 2008 respectively), as well as non-syllabic /r/ in Polish (see Stolarski 2011). Figures 1 and 2 below illustrate two examples.

Figure 1 – The Polish word rdza'rust', non-syllabic [ɾ] in #rC (from Stolarski 2011).

Figure 2 – The Slovak word navrh'proposal', syllabic [ɾ] in context CrC (from Pavlik 2008).

What the data appear to suggest, when taking all the contexts into consideration, is that the tap's structure may include vocalic elements on both sides of the constriction. They are clearly delimited and salient on the side(s) where it does not border with a full vowel. Thus, CrC, #rC and Cr# show both vocoids because the consonants or pauses flanking the rhotic contrast with the vocoids and emphasize them. Cr, rC, #rV and Vr# show only one vocalic element, either on the left or on the right, depending on where the consonant or pause which renders the vocoid salient occurs. VrV would show only the constriction, the tap having nuclear vowels on both sides for the tap's vocoids to 'melt into'. Therefore, under this view, the structure of the tap is always the same: 'vocoid-constrictionvocoid', and the phonetic context reveals or hides different parts of it.

## 2. The experiment on [ɾ] in Romanian

#### 2.1 Purposes

The main purpose of the current experiment is to measure the formant structure of the vocalic elements of the tap in order to determine how much their quality can vary. Another aim is to measure the mean duration of the salient vocoids and the constrictions in Romanian words. A third aim is to investigate the possibility of the structure argued for in Section 1 being detectable in context VrV as well.

#### 2.2 Setup

Recordings of Romanian words in isolation were made, containing /r/ in contexts #rV, (V)CrV, VrC(V) and VrV, where C is a stop (/p, t, k, b, d, g/) and V is one of the seven vowels of Romanian (**/**a, e, i, o, u, ə, ɨ**/**). Additionally, recordings of nonsense VrV sequences and sustained tokens of each Romanian vowel were obtained from each speaker. Clusters Cr and rC were flanked by either the same vowel on both sides, or by a vowel and a word-boundary. The idea behind this is to have the tap in the immediate vicinity of only one vowel, so as not to have it influenced by two vowels of different qualities at once. This gives the vocalic part the opportunity to have a quality as similar as possible to that of the vowel that *is* in its vicinity. For example, the way to find out how much the vocalic element can approach the quality of [i] is to include the sequences /iCri/ or /#Cri/, rather than /aCri/. Examples of words used in the experiment are given below:


The 5 participants (4 female, 1 male) read the words and sequences off PowerPoint slides 4 seconds apart2 and the recording session was repeated three times for each speaker, the quality of the recordings being adequate for the purposes of the analysis. The process resulted in a corpus of 1680 words that were subject to acoustic analysis with the software PRAAT (Boersma & Weenink 2011).

<sup>2</sup> The time between slides was introduced in order to exclude coarticulation effects, especially for context #rV. Frame sentences were not used for the same reason.

#### 2.3 Results

The realization of /r/ was that of one constriction with the accompanying salient vocalic element in 86.66% of the tokens for contexts #rV, Cr, rC (1470 words). Examples are given in the spectrograms below. Other realizations included trills and approximants, but they were not included for acoustic analysis.

Figure 3 – The word /'radu/ (proper name), context #rV.

Figure 4 - The word /drog/ 'drug', context Cr.

Figure 5 – The word /porto'kalə/ 'orange', context rC.

#### 2.4 Quality of the vocalic elements

The vocalic elements have been reported to have qualities similar to that of [ə] and [ɨ] (Avram 1993; Vago & Gósy 2007; Stolarski 2011). However, they have also been reported to be similar to the nuclear vowels in their vicinity, albeit more central (Quilis 1993 cited in Schmeiser 2009; Baltazani 2009, among others).

These studies were done on languages like Spanish and Modern Greek, which do not include mid or high central vowels in their inventories. It would, therefore, be interesting to see what happens when the nuclear vowels surrounding the tap are themselves central and mid or high. This is an opportunity which a language like Romanian provides, with its /ə/ and /ɨ/. Could we narrow down the possible space of variation of the vocalic elements?

The graphs below plot the average quality of the vocalic element in each word, for all participants, all three times the recording session was repeated, as compared to the average quality of the sustained tokens of the seven Romanian vowels uttered by the same speakers.

In the three graphs, the vocalic elements (small size) match the shape of the nuclear vowel (large size) they have in their immediate vicinity. For example, the small filled triangles correspond to vocoids in sequences /#ra/, /(a)Cra/, /arC(a)/. Each small-sized symbol is dedicated to one word used in the experiment and its position on the graph represents the average formant values of the vocoid in the respective word, across participants and recording sessions. For context #rV, there were two words used per full vowel, hence two small shapes for each large one. Cr and rC have six words per full vowel, one for every stop consonant (/p, b, t, d, k, g/).

Graph 1 – Vocalic elements in context #rV.

Graph 2 – Vocalic elements in context rC.

Graph 3 – Vocalic elements in context Cr.

The graphs show that, for all contexts, the vocalic elements tend to approach the quality of the full vowels they are surrounded by, but there appear to be certain limits to this variation. They remain mid-high, central to front3 and seem to consistently stay away from [a], [o] and [u]. This is especially easy to observe

<sup>3</sup> As pointed out by an anonymous reviewer, the quality of the tap's vocoids, as shown by the graphs, raises a phonological question: what is their featural specification, if we are to consider that the complex acoustic structure is mirrored in phonology? Are the vocoids underspecified for height and backness? One way to approach the issue would be through statistical analysis, as suggested by the reviewer, or by phonological study. I leave this matter for further research.

when looking at what happens to the vocalic parts surrounded by [ə] and [ɨ]. Having these full vowels around pushes the vocalic elements to a slightly more front area than the central vowels are in themselves, which is a strong indication that the point where the vocoids cannot reach further back is near. Indeed, as mentioned above, the vocoids corresponding to [o] and [u] are much more front than these two vowels. Actually, it appears that when the full vowel around the tap is [o] or [u], the vocoids are very similar to, or show overlap with, [ə] and [ɨ]. The vocoids are also much higher than [a] in all contexts, remaining mid.

Though there are limits to the backness of the vocalic elements of the tap, they can be quite front, approaching [i] and, to a lesser extent, [e].

The graphs also show variation according to context. Cr allows the vocalic elements to vary and approach the quality of the surrounding vowel the most. The most front and high vocoids may be found in this context, namely those in the /(i)Cri/ sequences. The /(e)Cre/ words also appear to contain vocoids that are closer to [e] than other contexts.

Context rC keeps the vocoids closer together than Cr. However, context #rV clusters them together in a tighter, mid-central area (in agreement with Baltazani & Nicolaidis 2013).

Let us now see if the place of articulation of the consonant has an influence on the vocoid in the word in which it occurs. This is shown in Graph 4 below.

Graph 4 – Average vocoid by place of articulation of the C in Cr and rC.

In Graph 4, the smaller symbols represent the average vocoid according to the vowel flanking the cluster Cr or rC (shape of the symbol), and according to the place of articulation of the C(onsonant) in the cluster. The light gray shapes stand for the vocoids when C is a dental stop. The dark gray shapes represent clusters with bilabial stops, while the black symbols are for velar stop clusters. The vocalic elements in contexts Cr and rC have been averaged together for the same place of articulation of the C. For instance, the small light gray filled triangle represents the vocoids in sequences /art(a)/, /(a)tra/, /ard(a)/, /(a)dra/, again across participants and recording rounds.

As Graph 4 shows, there appears to be a tendency for the vocoids in bilabial stop-rhotic combinations to be slightly more back, while vocalic elements in clusters with dental and velar stops tend to be more front. That said, the C in Cr and rC clusters does not seem to have a significant effect on the tap's vocoids. The influence of the full vowel flanking each cluster is clearly stronger.

#### 2.5 Durations

On average, measurements show that the duration of the constricted interval is smaller than the duration of the vocalic element. Table 1 shows that the vocoid in a word-initial tap (context #rV) has the longest average duration, and the difference between the vocoid in this context and other contexts is quite significant (more than 20 ms), as reported for Greek in Baltazani & Nicolaidis 2013. The average vocoid in Cr and rC has about the same duration, while Cr has the shortest constricted interval.


Table 1 – Average durations (ms) of constrictions and vocalic elements.

The current experiment did not control for factors like speech rate, word length and stress placement, which could influence the durations, so more investigation is needed in order to elaborate further on this issue.

#### 2.6 Context VrV: formant changes

"Abrupt formant changes" have been reported during the first vowel, towards the constriction, when the tap is in context VrV (Baltazani & Nicolaidis 2013). It would be expected that this phenomenon should occur systematically if the tap has vocoids of its own in the intervocalic context as well. Specifically, one would expect the formants to change, when nearing the constricted interval, towards a configuration similar to that of a mid-high, central (to front) vowel, which is the area in which the vocoids of the tap are.

This would indeed appear to be the case, as Figures 6-8 below show for the nonsense sequences.

Figure 6 – The nonsense sequence [aɾa].

The vowel [a] has a high F1 and a low F2. Figure 6 for the sequence [aɾa] shows that, near the constriction, F1 decreases and F2 increases, making the target configuration higher and more front than [a].

Figure 7 – The nonsense sequence [iɾi].

A low F1 and a high F2 are characteristics typical of the vowel [i]. In Figure 7, showing the nonsense sequence [iɾi], a slight increase in F1 and a decrease in F2 can be noticed, which means that, towards the constriction, formants aim for a vowel that is a little lower and more back than [i].

Figure 8 – The nonsense sequence [uɾu].

The vowel [u] has a low F1 and a low F2, which are visible at the edges of Figure 8. Towards the constriction, F2 increases, suggesting a vowel which is more front than [u]. Some tokens, such as the one in the spectrogram above, even exhibited a portion in which the formants are in a steady-state configuration near the constriction, which would support the claim that the tap has vocoids that are detectable in context VrV.

Considering Figures 6, 7 and 8 together, the formant changes suggest that, in the immediate vicinity of the constricted interval, formants tend to approach the configuration that would place the vowel in the area in which the (salient) vocoids of the tap cluster in other contexts4 , as indicated in Graphs 1-4. In addition to this, Figure 8 shows a token in which the vocoids are salient even in VrV, as suggested by the steady-state portion of the formants immediately before and after the constricted interval.

#### 3. Phonetic conclusions

The data from Romanian, corroborated with data from other languages, seem to support the hypothesis that [ɾ] includes one vocalic element flanking each side of the constriction. The results of the current experiment on this sound in Romanian suggest that the tap's vocalic elements may vary in quality, but stay in the mid-high, central to front area.

The tap's vocoids are not clearly delimited where the rhotic borders with a vowel because on a spectrogram they show up as a continuous vocalic sequence, perhaps with formant changes. One cannot tell where the vocoid of the tap ends and the full vowel begins. However, the vocoids of the tap become salient when they border with stop consonants because the stops have different spectral characteristics. This is why one vocalic element is salient when the tap has a nuclear vowel on one side: the vocoid on the other side would not be distinguishable from the nuclear vowel. The full structure is easy to distinguish only when [ɾ] has no nuclear vowels on either side (contexts #rC, CrC, Cr#).

<sup>4</sup> An anonymous reviewer draws my attention to the fact that, in VrV, we might view the vocalic parts as simple transitions. While this is indeed the case, I consider that they are better viewed as parts of the tap, given the clearly delimited vocoids that appear in other contexts. Future research may shed more light on the matter, for example, by comparing the transitions in VrV to those in VdV, as the reviewer suggests.

## 4. A phonological implication

This kind of structure, which includes vowel-like parts, may be what allows [ɾ] to appear in onset and coda position, but also function as a syllabic nucleus, as is the case in Slavic languages like Czech and Serbo-Croatian. In these languages one can find entire sentences composed only of consonants (see Figure 9 below).

Figure 9 – The Czech tongue twister sentence *Strč prst skrz krk* 'Put your finger through your throat', which contains only consonants. The syllabic nuclei are rhotic taps. Source of the sound-file: http://upload.wikimedia.org/wikipedia/commons/1/12/Prst\_a\_krk.ogg.

/r/ may even be the locus of phonemic length and pitch distinctions in Slovak (length) and Serbo-Croatian (length and pitch) (Sussex & Cubberley 2006:187- 188). If its [ɾ] realization contains vocalic elements (see Pavlík 2008 for an acoustic study of /r/ in Slovak), it would be reasonable to assume that they are the ones bearing said distinctions5 . As an example, in Serbo-Croatian there are minimal pairs of words distinguished only by the tone on /r/. For instance, *bŕzo* (long rising) is the adjective 'quick', neuter singular form, while *br*̑*zo* (long falling) is the corresponding adverb, 'quickly'. Figrue 10 below shows the minimal pair uttered by a native speaker. /r/ is realized as [ɾ], and the structure 'vocalic element – constriction – vocalic element' is easily distinguishable.

<sup>5</sup> /r/ is not the only consonant with the ability to be a syllabic nucleus and bear length and pitch distinctions. In Czech /l/ can be syllabic as well, and in Slovak it can be a syllabic nucleus and carry length distinctions along with /r/. As is known, /l/ and /n/ are vowel-like on a spectrogram, something which I consider to be related to their ability to be syllabic nuclei, and carry length distinctions in the case of /l/. In fact, if the ability of /l/ to exhibit this behavior is linked to its vocalic character, it would be only expected for /r/ (in this case the tap) to be vocalic in character as well, which may be taken as an additional argument in favor of the 'vocoid-constriction-vocoid' structure.

Figure 10 – [bɾ̩̌ːzo] and [bɾ̩̂ːzo], uttered by a female native speaker of Serbian.

#### 5. Conclusion

The main focus of this paper was to establish the details of the internal phonetic structure of [ɾ]. It was argued that the general structure of this sound is 'vocoid-constricted interval-vocoid', which would unify the seemingly different realizations of the rhotic segment with one constricted interval that appear in different phonetic contexts. An acoustic analysis of the formant structure of the aforementioned vocoids in Romanian revealed them to be mid-high and central (to front), which agrees with and completes similar acoustic studies, done on this sound in other languages. Finally, I suggested that this partly vocalic structure is what allows the tap to be a syllable nucleus and bear phonemic length and pitch distinctions, as it does in languages like Slovak and Serbo-Croatian.

#### References


Boersma, Paul & David Weenink. 2011. *PRAAT: doing phonetics by computer: version 5.2.14.*


## New insights into American English V+/r/ sequences

#### María Riera & Joaquín Romero, Universitat Rovira i Virgili

#### Abstract

This paper presents an acoustic study of final V+/r/ sequences in American English stressed monosyllables. We provide experimental data to show the durational and spectral characteristics of the vowel, the consonant and the VC transition, we explain the presence of this transition in relation to the vowel and the consonant, and we examine the role of speaking rate. The results show the presence of a transitional vocalic element that varies significantly as a function of the vowel and speaking rate. They also show significant durational and spectral differences which can be interpreted as the result of VC coarticulation.

#### 1. Introduction

#### 1.1 Overview

The study presented in this paper forms part of a wider ongoing acoustic study that seeks a better understanding of the phonetic and phonological nature of final V+/l/ and V+/r/ sequences in American English stressed monosyllables by investigating the VC coarticulatory processes that take place in them. On the one hand, the present study expands on and replicates in part previous studies carried out by the authors (Riera & Romero 2006, 2007; Riera et al. 2009) in an attempt to gain new insights into the behavior of V+/r/ sequences in particular. On the other hand, the present study introduces innovative aspects related to participants, stimuli, segmentation procedures and measurements taken: the number of participants has been increased, the stimuli have been modified, a more objective method of segmentation and boundary identification has been applied and consonant (i.e., /r/) measurements have been included. In this study we provide experimental acoustic data to show the durational and spectral characteristics of the vowel, the consonant and the VC transition, we explain the presence of this transition in relation to the vowel and the consonant, and we examine the role of speaking rate.

#### 1.2 Previous studies

Previous studies that have looked into V+/r/ sequences have focused on the schwa-like element that is often perceived in some of these sequences. Terms like *epenthetic schwa* (Warner et al. 2001), *excrescent schwa* (Gick & Wilson 2001, 2006) or *targetless schwa* (Browman & Goldstein 1992b) might be used to refer to this element. According to Gick & Wilson (2001, 2006), the perceptual presence of this element after high front vowels can be explained as the result of the tongue movement required in passing through a schwa-like configuration. Browman & Goldstein (1992b) make reference to the influence exerted by neighboring segments on what they call *targetless schwa*. Wells (2000) uses the term pre-r breaking1 to refer to cases of schwa epenthesis in sequences containing high vowels, whereby monophthongs become diphthongs and diphthongs become triphthongs. Lavoie & Cohn (1999) state that monosyllables consisting of non-low tense pure vowels or diphthongs followed by a liquid can be pronounced with either one or two syllables. Hall (2003, 2006) distinguishes between schwa intrusion and schwa epenthesis/insertion. In her view, intrusive vowels are phonologically invisible, are inserted late in the phonological derivation, cannot act as syllable nuclei, do not add a syllable to the word and do not involve the addition of a vowel segment. Moreover, they are not likely to occur in the most marked types of CC clusters, tend to occur between heterorganic consonants, copy only over sonorants or gutturals and are either copy vowels or neutral and schwa-like in quality.

Riera & Romero (2006) provide an impressionistic analysis of V+/l/ and V+/r/ sequences by means of visual spectrographic observation in a preliminary descriptive study that relies on acoustic data from two speakers and considers the whole range of American English stressed vowels. The study acknowledges the presence of VC transitions in some of these sequences and of a variable schwa-like element which is not visually detectable to the same extent in all of the VC transitions. It also suggests a relationship between front versus back versus central vowels as well as between high and tense versus non-high and lax ones. No acoustic measurements are taken in this study. The role of speaking rate is evidenced only by the fact that VC transitions are more easily discernible in slow tokens than in fast ones. It is concluded that the presence of the transitional element is the result of a dynamic phonetic process of coarticulation rather than of a discrete phonological rule of epenthesis/insertion.

In experimental studies conducted by Riera & Romero (2007) and Riera et al. (2009), durational and spectral measurements reveal differences between the

<sup>1</sup> Wells (2000) also uses the term pre-/l/ breaking to refer to cases of schwa epenthesis in V+/l/ sequences.

schwa-like element and canonical schwa2 as well as variability in the schwalike element as a function of both the preceding vowel and speaking rate. The formant values of this element are significantly different from those of canonical schwa and tend to resemble more those of the preceding vowel the faster the speaking rate in both the V+/l/ (Riera & Romero 2007) and the V+/r/ (Riera et al. 2009) sequences. The phenomenon under analysis is regarded in these studies as a generalized process affecting all contexts (i.e., all stressed vowels + /l/ or /r/), rather than, for example, only high vowels, as has been implied by previous studies (Gick & Wilson 2001, 2006; Lavoie & Cohn 1999; Riera & Romero 2006; Wells 2000). As in Riera & Romero (2006), coarticulation, rather than epenthesis/insertion, is favored. The segmentation procedure in these studies is based solely on the observation of acoustic waveforms and spectrograms as well as on the auditory corroboration by the experimenters. These studies rely on acoustic data from only one (Riera & Romero 2007) or two (Riera et al. 2009) speakers. Durational, F1 and F2 measurements are obtained in both studies, but F3 measurements are obtained only for the V+/r/ sequences. Measurements for the vowel and the transitional element only are obtained in both studies; neither of them includes consonant (i.e., /l/ or /r/) measurements and thus the behavior of the transitional element is explained only in terms of its relationship with the preceding vowel. Speaking rate (i.e., slow vs. fast) differences are considered in both studies. Thus, the current study expands on these previous findings by offering a more reliable methodological approach to segmentation and by providing data for the consonants as well as for a larger pool of subjects. Also, it provides measurements of the different parts of the sequences taken at midpoint rather than mean measurements of them, which was the measurement procedure used in previous studies.

#### 1.3 The present study: Objectives and hypotheses

As mentioned above, the overall main objective of this study is to investigate the VC coarticulatory processes that take place in final V+/r/ sequences in American English stressed monosyllables. In order to do this, (i) we provide experimental data to show the durational and spectral (i.e., F1, F2 & F3) characteristics of the vowel, the consonant and the transitional vocalic (i.e., schwa-like) element, (ii) we explain the presence of this element in relation to the vowel and the consonant, and (iii) we determine the role of speaking rate (iiia) by looking for durational, F1, F2 and F3 variability in the vowel, the transitional element

<sup>2</sup> Canonical schwa refers to a lexically-licensed vowel that shows relatively stable spectral characteristics and is not usually subject to significant contextual variability, as in the first syllable of the word *ahead*.

and the consonant, and (iiib) by comparing F1, F2 and F3 mean values in the different contexts (i.e., each of the V+/r/ sequences) and the different rates (i.e., slow and fast).

The results of this study are expected to provide evidence for the existence of coarticulatory processes and to make manifest the extent of VC coarticulation in the V+/r/ sequences under study. By looking into the behavior of the vowel, the transitional element and the consonant in the sequences, and by looking at the influence exerted by both the vowel on the transitional element and the transitional element on the consonant, the phonetic, rather than phonological, nature of the transitional element will be revealed.

We hypothesize (i) that there will be significant durational, F1, F2 and F3 variability in the vowel, the transitional element and the consonant, across contexts, and as a function of speaking rate, and (ii) that the F1, F2 and F3 mean values of the different contexts will tend to resemble each other more in the slow-rate productions than in the fast-rate ones. This is expected to be especially the case for the vowel and the transitional element but not so much for the consonant. The greatest differences are expected to be particularly noticeable for F1 and F2 but less so for F3.

The hypotheses presented here regarding the coarticulatory nature of the V+/r/ transitions are in accordance with the approach to speech production and gestural organization illustrated by the theory of Articulatory Phonology (Browman & Goldstein 1986, 1989, 1990a, 1990b, 1992a; Goldstein & Fowler 2003). Articulatory Phonology offers a view of phonological organization based on articulatory gestures as primitive units that are responsible for both phonological invariance and phonetic variability and thus bridges the gap between the two levels of description. A key aspect of the theory for our study is the fact that it contemplates time as an intrinsic part of the description of gestures, therefore providing a much more plausible explanation for the coarticulatory variability caused by rate differences, than would be given by a theory based on discrete underlying segments.

#### 2. Method

#### 2.1 Speakers

The subjects that participated in the experiment were six native speakers of American English. Four were male and two female. They all had rhotic accents. Three had a western accent (California, Utah, Wyoming), one a midwestern accent (Wisconsin) and one an upper-southern accent (Tennessee). The last speaker reported having lived in different parts of the US but selfidentified her accent as being mid-western. Their ages ranged from 24 to 40. Four of them were temporarily living in Spain for a period of at least one year at the time of the recording; two had been living in Spain for over five years. Only one speaker had some specialized phonetic training; the rest had none. All the speakers were unaware of the purposes of the experiment prior to being recorded. Sex, type of accent, age, place of residency, contact with or knowledge of the Spanish language, and specialized phonetic training were not considered relevant factors to affect the purposes of our experiment in any negative way.

#### 2.2 Stimuli

The target words that were selected for the experiment reported in this paper were seven English monosyllables containing final V+/r/ sequences (i.e., *fear*, *fair*, *par*, *pore*, *poor*, *hire* and *power*). Fifteen English monosyllables containing final V+/l/ sequences (i.e., *feel*, *bill*, *pale*, *fell*, *pal*, *Poll*, *Paul*, *hole*, *pull*, *pool*, *hull*, *furl*, *pile*, *howl* and *boil*) were also included as target words to be separately analyzed as part of the wider ongoing study. Fifteen distracters, consisting of C1VC2, where C2 was one of /t/ or /d/, were included as well. These were the words *heat*, *fit*, *hate*, *vet*, *fat*, *hot*, *fought*, *vote*, *hood*, *food*, *hut*, *heard*, *hide*, *void* and *vowed*. All the target words and distracters were inserted in the carrier sentence *Say \_\_\_ for me again*. In order to minimize unwanted coarticulatory effects, C1 was a non-lingual (unlike /r/ and /l/) and oral (like /r/ and /l/) consonant in the target words, the distracters and the word *for*.

#### 2.3 Data collection

The six speakers performed two readings each of ten randomized repetitions of the carrier sentence containing the target words and distracters reported in the previous subsection. The first reading was performed at a slow speaking rate; the second at a faster one. The speaking rate variable was controlled for by presenting the slow-rate readings at four-second intervals separated by a three-second break every 20 sentences and the fast-rate readings at one-second intervals with a three-second break every five sentences. The readings took place in two different sessions separated by a 30-minute period. Each of the sessions was preceded by an instruction period and a trial period of 20 tokens, which were not used for the analysis. After the second session, the speakers were informed of the purposes of the experiment and were asked to fill out a questionnaire to provide some very general personal information relevant only to the purposes of the experiment. The data were recorded at a 44,100 Hz sampling rate directly into a laptop computer using an M-Audio Nova condenser microphone, an M-Audio Firewire Solo mobile interface, and the Praat speech analysis software (Boersma & Weenink 2010), which was also used for the subsequent data analysis.

#### 2.4 Data analysis

#### 2.4.1 Segmentation procedure

From a segmental point of view, the V+/r/ sequences under study are considered to be composed of two elements only (i.e., a vowel followed by a consonant). However, in order to identify the transitional element in them, the sequences had to be divided into three parts, corresponding to the vowel, the transitional element and the consonant. In the case of sequences containing diphthongs, they were divided into four parts and it was the second element of the diphthong that was taken into account for the analysis.

Given the dynamic nature of the transitional element, and therefore the difficulties in identifying and delimiting it, we applied a first differentiation algorithm to the F1, F2 and F3 traces as identified by an automatic formant tracking routine in order to obtain velocity curves for each of these spectral events. This allowed us to automatically identify inflexion points in the formant traces that corresponded with the boundaries between the three portions of the signal under study and thus made it possible to isolate the transitional element. A Praat script was written to obtain these first derivative traces and identify the peaks of formant change given by velocity maxima and minima. These peaks were then taken as reference points for boundary placement.

Figure 1 illustrates the segmentation procedure. The upper part of the figure shows the acoustic wave and the spectrogram for the slow version of the /ir/ sequence in the word *fear* as produced by one of the speakers. In the lower part there are a series of tiers which provide information about F1 and F2 first derivative peaks of formant change indicating velocity maxima and minima (first and third tiers). These maxima and minima correspond to inflection points in the velocity trace and can, therefore, be identified with the beginning and end of specific events. The second tier in this lower part of Figure 1 shows the segmentation of the sequence into three parts (i.e., vowel, transitional schwalike element and consonant). The vertical lines in this second tier are determined by observing where the broken lines in the F1, F2 and, if necessary F3, tiers fall, and then by deciding which of these lines correspond to the beginning and end of the different parts of the sequence. In the case exemplified here, one peak provided by the F2 derivative was chosen to mark the beginning of the schwalike element, whereas one peak in the F1 derivative was chosen to mark its end. Because it was not necessary to rely on F3 derivative peaks, these are not shown in the figure.

As might be inferred from the information provided in Figure 1, problems related to determining boundary placement often arise. In such cases, the automated procedure needs to be complemented by visual observation of waveform and spectrographic cues as well as by auditory corroboration. This is particularly necessary in the case of fast tokens and sequences containing low back vowels (i.e., /ɑ/ and /ɔ/). The objective segmentation procedure can then be considered to be more reliable than the subjective one only to a certain extent, but nonetheless reliable enough to the point that it allows for consistency in the segmentation procedure. auditory corroboration. This is particularly necessary in the case of fast tokens and sequences containing low back vowels (i.e., /\_/ and /\_/). The objective segmentation procedure can then be considered to be more reliable than the subjective one only to a certain extent, but nonetheless reliable enough to the point that it allows for consistency in the segmentation procedure.

Figure 15 – Segmentation procedure for the /ir/ sequence corresponding to the slow version of the word FEAR as produced by one of the speakers. The vertical broken lines in the first and third textgrid tiers represent the F1 and F2 first derivative peaks of formant change indicating velocity maxima/minima. The second tier shows the /ir/ sequence segmented into three parts: vowel, transitional element and consonant. *2.4.2 Measurements* Figure 1 – Segmentation procedure for the /ir/ sequence corresponding to the slow version of the word FEAR as produced by one of the speakers. The vertical broken lines in the first and third textgrid tiers represent the F1 and F2 first derivative peaks of formant change indicating velocity maxima/minima. The second tier shows the /ir/ sequence segmented into three parts: vowel, transitional element and consonant.

A Praat script was designed to extract midpoint duration, F1, F2 and F3 values for the

#### vowel, the transitional element and the consonant. Mean values for each context (i.e., each of the V+/r/ sequences) and for each rate (i.e., slow and fast) were then obtained and used for the 2.4.2 Measurements

variables were duration, F1, F2 and F3 mean values.

statistical analyses. F1, F2 and F3 mean values were also used for comparisons between the slow and fast speaking rates. *2.4.3 Statistical analyses* Two-way factorial ANOVAs were performed to test for duration, F1, F2 and F3 overall A Praat script was designed to extract midpoint duration, F1, F2 and F3 values for the vowel, the transitional element and the consonant. Mean values for each context (i.e., each of the V+/r/ sequences) and for each rate (i.e., slow and fast)

variability in the vowel, the transitional element and the consonant. The independent variables were rate (i.e., slow and fast) and context (i.e., each of the V+/r/ sequences); the dependent

In the cases were interactions proved to be significant, independent one way ANOVAs for each of the two rates (i.e., slow and fast) were subsequently performed to confirm the variability shown by the two-way factorial ANOVAs, or to test for further variability, by looking at the two rates separately. The independent variable was context (i.e., each of the V+/r/ sequences); the dependent variables were duration, F1, F2 and F3 mean values.

107

were then obtained and used for the statistical analyses. F1, F2 and F3 mean values were also used for comparisons between the slow and fast speaking rates.

#### 2.4.3 Statistical analyses

Two-way factorial ANOVAs were performed to test for duration, F1, F2 and F3 overall variability in the vowel, the transitional element and the consonant. The independent variables were rate (i.e., slow and fast) and context (i.e., each of the V+/r/ sequences); the dependent variables were duration, F1, F2 and F3 mean values.

In the cases were interactions proved to be significant, independent one way ANOVAs for each of the two rates (i.e., slow and fast) were subsequently performed to confirm the variability shown by the two-way factorial ANOVAs, or to test for further variability, by looking at the two rates separately. The independent variable was context (i.e., each of the V+/r/ sequences); the dependent variables were duration, F1, F2 and F3 mean values.

#### 3. Results

#### 3.1 ANOVAs for variability

As mentioned above, the two-way factorial ANOVAs looked for duration, F1, F2 and F3 overall variability3 in the vowel, the transitional element and the consonant. Rate and context were the independent variables and duration, F1, F2 and F3 mean values the dependent variables. Significance level was set at p<.01. Significant differences were obtained in almost all cases. The results showing variability in the vowel were significant for all speakers, for rate, context and the interaction between rate and context, and for duration, F1, F2 and F3. The results showing variability in the transitional element and the consonant were significant for all speakers, for context, and for duration, F1, F2 and F3. They were non-significant in the following cases, which involve combinations of rate or the interaction between rate and context, and duration, F1, F2 or F3: Speaker 2 (rate, F2), Speaker 3 (rate\*context, duration; rate, F3), Speaker 4 (rate, F3), and Speaker 6 (rate\*context, duration; rate F1). The results showing variability in the consonant were non-significant in the following cases: Speaker

<sup>3</sup> Here *variability in the vowel* does not refer to intra-token variability but rather to the comparison between the mean values for the vowels in the different contexts (i.e., *fear* vs. *hire* vs. *fair* vs. *par* vs. *pore* vs. *poor* vs. *power*). As expected, the results show that these vowels are indeed different. The reason why we have decided to make this seemingly obvious comparison is so that it can then be compared with the differences in the transitional element and thus show that the transitional element retains some of the variability of the vowel but is much more deeply affected by the lack of a specific articulatory target, as demonstrated by the significant differences across rates.

1 (rate\*context, duration; rate, F1), Speaker 2 (rate, F2; rate\*context, F3), Speaker 3 (rate\*context, duration), Speaker 4 (rate\*context, duration; rate\*context, F1), Speaker 5 (rate\*context, duration; rate, F2), and Speaker 6 (rate, F2).

The results of the separate one-way ANOVAs performed to confirm the variability shown by the two-way factorial ANOVAs, or to test for further variability, with context as the independent variable and duration, F1, F2 and F3 as the dependent variables, also yielded significant differences in almost all cases. Significance level was again set at p<.01. As with the two-way factorial ANOVAs, the results showing variability in the vowel were significant for all speakers, for duration, F1, F2 and F3, and for both rates. The results showing variability in the transitional element were significant for five speakers, for duration, F1, F2 and F3, and for both rates. The exception was Speaker 4, with non-significant differences for duration for both rates. The results showing variability in the consonant were significant for two speakers, for duration, F1, F2 and F3 for the slow rate. They were non-significant in the following cases: Speaker 1 (F2, slow), Speaker 2 (F3, slow), Speaker 4 (duration, slow), and Speaker 5 (duration, fast; F2, slow).

#### 3.2 Means for comparisons between speaking rates and variability

Figures 2, 3 and 4 show scatter plots for mean F1, F2 and F3 values, respectively (i) for one speaker, (ii) for the seven V+/r/ contexts, (iii) for the vowel, the transitional element and the consonant, and (iv) for the slow-rate and fast-rate productions. Due to space constraints, the data from one speaker only will be used to exemplify what, as a general rule, applies to the other five speakers as well.

As can be observed in these scatter plots, formant values for the same target words show clear differences across speaking rates (i.e., compare slow and fast *fair* V F1, slow and fast *hire* T F2 or slow and fast *poor* C F3). This is especially noticeable for F1 and F2 and less so for F3. It is also particularly discernible in the case of the vowel and the transitional element, but not so much in the case of the consonant.

What can also be detected in these scatter plots is the fact that the difference between the mean values across contexts tends to be smaller in the slow-rate productions than in the fast-rate ones. In other words, there is greater dispersion between the mean values of the seven tokens in the fast rates than in the slow ones. Again, this can be easily seen in the case of F1 and F2 but is not easy to perceive in the case of F3. Likewise, it is more easily distinguishable in the vowel and the transitional element than in the consonant.

These observations provide the grounds to state that, albeit to a different extent,

there is variability in the vowel, the transitional element and the consonant as regards F1, F2 and F3 mean values in the vowel, the transitional element and, to a lesser extent, the consonant, across rates.

Figure 2 – Scatter plots for F1 values for one speaker by context. Each data point represents the mean for 10 tokens in each category. The vertical axis shows F1 frequency. The horizontal axis shows the values for the vowel (V), transition (T) and consonant (C) as well as the slow and fast rate values for each of these.

Figure 3 – Scatter plots for F2 values for one speaker by context. Each data point represents the mean for 10 tokens in each category. The vertical axis shows F2 frequency. The horizontal axis shows the values for the vowel (V), transition (T) and consonant (C) as well as the slow and fast rate values for each of these.

Figure 4 – Scatter plots for F3 values for one speaker by context. Each data point represents the mean for 10 tokens in each category. The vertical axis shows F3 frequency. The horizontal axis shows the values for the vowel (V), transition (T) and consonant (C) as well as the slow and fast rate values for each of these.

#### 4. Discussion and conclusions

Th e purpose of this study has been to further investigate the VC coarticulatory processes that take place in fi nal V+/r/ sequences in American English stressed monosyllables in order to contribute new insights into the behavior and nature of these sequences. Th ese insights are meant to expand on the results obtained and conclusions reached in previous studies carried out by the same authors (Riera & Romero 2006, 2007; Riera et al. 2009). We have designed an experiment which replicates in part these previous studies but also introduces innovative aspects related to participants, stimuli, segmentation procedures and measurements taken. We have gathered acoustic data for the diff erent constituent elements in the sequences that have allowed us to confi rm already existing conclusions and reach new ones concerning the role played by speaking rate.

Th e fi rst hypothesis (i.e., that there is signifi cant durational, F1, F2 and F3 variability in the vowel, the transitional element and the consonant, across contexts, and as a function of speaking rate) has been confi rmed by the results of the statistical analyses, as well as by the information presented in Figures 2, 3 and 4. Th is provides evidence for the existence of coarticulatory processes and shows the extent of VC coarticulation in the V+/r/ sequences which are the object of our study. Despite having mean duration, F1, F2 and F3 values similar to those of a mid central vowel (i.e., schwa), the transitional element has been proven to be different in each of the different contexts (i.e., different vowels + /r/). This rules out the possibility of the transitional element being considered a segment and thus reveals its phonetic, rather than phonological, nature.

The second hypothesis (i.e., that the F1, F2 and F3 mean values of the different contexts tend to resemble each other more in the slow-rate productions than in the fast-rate ones) provides evidence for the dynamic nature of the sequences, in general, and of the transitional element, in particular. The information provided in Figures 2, 3 and 4 evidences that it takes longer for the vowel to attain the transitional element target and for this element to attain the consonant target in the slow productions than in the fast ones. It shows, therefore, how an increase in speech rate entails a decrease in time for the articulatory gestures to attain their targets. All in all, it proves that we are dealing with a process of coarticulation rather than epenthesis/insertion. This also complements the findings of previous studies (Riera & Romero 2007; Riera et al. 2009) that reveal how the coarticulatory influence of the vowels on their corresponding transitional elements is shown by the fact that the spectral values of these elements tend to resemble more those of the preceding vowels the faster the speaking rate.

Despite not varying much across V contexts, the acoustic characteristics of the /r/ in the different sequences show some variability, which can be taken as proof of the coarticulatory influence exerted by the vowel on the schwa-like element, by the schwa-like element on the consonant, and even by the vowel on the consonant. The fact that the variability is smaller in the /r/ than in the schwalike element is explained by the fact that the /r/ is present underlyingly and, therefore, it is associated with clearly determined articulatory targets, whereas the schwa-like element does not correspond to any underlying segment and, therefore, has no specific articulatory targets.

The present study has not aimed at finding relationships between the sequences according to the phonological parameters for the classification of vowels (i.e., vowel height or frontness/backness). A possible further study would look into the role played by context (i.e., each of the seven different vowels in the V+/r/ sequences) as well as by examining vowel-transition and transition-consonant differences.

Finally, we believe that the limitations posed by an acoustic analysis of the type reported in this paper, based on segmentation as well as durational and spectral measurements, can only be overcome by an articulatory analysis of the type offered, for example, by the Electromagnetic Midsagittal Articulometer (EMMA) technique. This type of study is meant to be considered for future research.

#### Acknowledgments

Research Group in Experimental Phonetics (Universitat Rovira i Virgili, Tarragona, Spain).

Research Groups 2005-SGR00864 and 2009-SGR003 (Generalitat de Catalunya, Spain: Institut d'Estudis Catalans and Universitat Autònoma de Barcelona).

Projects HUM2005-02746 and FFI2010-19206 (Ministerio de Educación y Ciencia, Spain: Universitat Autònoma de Barcelona).

The authors would also like to thank two anonymous reviewers for their insightful comments and suggestions.

#### References

Boersma, Paul & David Weenink. 2010. *Praat: doing phonetics by computer: Version 5.2*.


language use. In Niels Schiller & Antje Meyer (eds.), *Phonetics and phonology in language comprehension and production: differences and similarities*, 159-207. Berlin: Mouton de Gruyter.


## /r/ in Washili Shingazidja

#### Cédric Patin, Université Lille 3

#### Abstract

In this paper, the distribution of the various allophones of /r/ in the Washili variety of Shingazidja, a Bantu language spoken on Grande Comore, is discussed in detail. /r/ appears as a *trill* ([r]) in absolute initial position (except before [i]) and after a consonant, and as a *tap* ([ɾ]) in intervocalic position. Complications arise since /r/ undergoes fortition to [ʈ ʂ ] in some classes but undergoes lenition in initial position when the following vowel is low-toned. An analysis is sketched in the CVCV framework (Lowenstamm 1996; Scheer 2004), claiming that the [r] allophone is underlyingly a geminate.

#### 1. Introduction

In this paper, I discuss in detail the distribution of the various allophones of /r/ (e.g. the *trill* [r] and the *tap* [ɾ]) in the Washili variety of Shingazidja. Shingazidja is a Bantu language (G44a) spoken on Grande Comore, an island belonging to Comoros (Shingazidja is one of the five Comorian languages). This is to my knowledge the first account of the distribution of rhotics in the language, and one of the very few discussions on rhotics in Bantu languages. A CVCV analysis of the distribution of the allophones of /r/ in Washili Shingazidja is also provided.

One speaker of this variety, Said Mohamed (34; in France for approximately 10 years), has been recorded (specifically for /r/) up to the present, with most of the recordings taking place in August 2010 and April 2011. The corpus consists of around 100 words and 20 phrases and sentences, each associated with several iterations, which were recorded twice: at Université Lille 3 (Villeneuve d'Ascq, France), in a closed office, with an Edirol R1 (microphone) and at ILPGA, Université Paris 3 (Paris, France), in an anechoic room.

In Section 2, I will provide some background information on Shingazidja, i.e. its phoneme inventory and previous mentions of /r/ in the literature. In Section 3, the basic distribution of the different allophones of /r/ is presented. I point out some complications, i.e. the role of consonants and tones in the distribution, in Section 4. In Section 5, I will defend the hypothesis, sketched in the CVCV framework, that the trill is associated with two skeletal positions (while the tap and the [ʈ ʂ ] allophone are associated with one skeletal position).

#### 2. Background

In this section, I provide some necessary background on Shingazidja as a language. All the information in this section may be applied to any of the varieties of Shingazidja. The first subsection is dedicated to the vowels and prosodic system of the language, while the second subsection focuses on consonants. In 2.1.3, I briefly discuss previous discussions of /r/ in Shingazidja.

#### 2.1 Vowel inventory and the prosodic system

Shingazidja has a classic 5-vowel system:

There are also nasal vowels in some Arabic loans (1-a), mostly when the Arabic word contains a pharyngeal or a glottal (a phenomenon known as 'rhinoglottophilia', a term that comes from Matisoff 1975), or in ideophones (1-b).

(1) a. á̰da 'custom' (< Ar. *ʔadah*) b. á̰ ! há̰ 'no'

Shingazidja has a word-group stress that falls on the penult of the phonological phrase. The language is also characterized by a reduced tone system (similar to a pitch-accent system) with complex manifestations such as unlimited shift of the tone – see Cassimjee & Kisseberth (1998), Patin (2007).

#### 2.2 Consonants

Table 1 shows the consonant inventory of Shingazidja, following Ahmed-Chamanga (2010), Full (2006), Lafon (1987), Rombi & Alexandre (1982) and my own observations.


Table 1 – Shingazidja consonants.

A large portion of the Shingazidja lexicon was borrowed from Arabic some centuries ago, and more recently but to a lesser extent from French. As a consequence, many consonants (namely those indicated in parentheses) generally surface only in (Arabic or French) loanwords. This is the case for the voiced labial and dental stops (2, 3) and [ʒ] (cf. the word ʒ*andarmu*'gendarme').


The interdental and velar fricatives and the glottal stop only appear in Arabic loanwords in formal speech (Ahmed-Chamanga 2010; Rombi & Alexandre 1982).


According to Rombi and Alexandre (1982), many speakers replace interdentals by [d], and velars by [h] (5).


<sup>1</sup> In this paper, underlining indicates that the *tone bearing unit* is lexically associated with a tone. Surface tones are signaled by acute accents.

On the other hand, the voiced implosives /ɓ/ et /ɗ/ (6-a, 6-b) and the retroflexes (6-c) essentially occur in the Bantu lexicon (but not only – there are variations among speakers).


It is not clear if prenasalized consonants in Shingazidja (i.e. /mb, mɓ, nt, nd, nʦ, nʣ, nɗ, nɖ, nʧ, nʤ, ŋk, ŋg/) correspond to one or two phonemes. They will thus not be discussed in detail here, and they are not included in Table 1.

#### 2.3 Previous accounts of /r/ in Shingazidja

No specific study has focused on /r/ in Shingazidja, and very few words have been written on the subject in studies with a broader purpose.

All authors who mention /r/ agree on its realization as a trill (e.g. "Das Phonem /r/ wird realisiert als stimmhafter alveolar Vibrant ([r])" Full 2006:114). Ahmed-Chamanga (2010), for instance, claims that "La consonne vibrante r du comorien est une consonne produite avec une vibration du bout de la langue au niveau des alvéoles. Elle ressemble au 'r' de l'italien ou de l'espagnol" (Ahmed-Chamanga 2010:24).

However, I rarely observed a clear trill realization when I worked with my previous informants, who came from various locations on the island. In my data, /r/ mostly appears as a tap. As we shall see in the following sections, the situation is different in Washili.

## 3. Basic distribution of rhotics in Washili Shingazidja

In this section, I examine the basic distribution of the trill [r] and tap [ɾ] allophones of /r/ in Washili Shingazidja. Section 3.1 discusses the trill realization that is associated with the absolute initial position. Section 3.2 deals with the tap allophone that emerges when /r/ is placed between two vowels inside the prosodic word, and section 3.3 shows that the tap is also selected when the intervocalic /r/ occurs at a word boundary.

#### 3.1 Absolute initial position

In absolute initial position, /r/ mostly appears as a trill [r] in Washili Shingazidja, especially before [+back] vowels (7). It is important to note that many of the words that exhibit an initial [r] are of Arabic origin (for instance ruhúsa 'permission' < Ar. *ruxsa*)2 . However, this is not always the case (see some imperatives, where [r] appears initially: réma 'beat!', ruká 'jump!').


Almost all the trills in my data consist of two periods of vibration. The trill realization of the initial /r/ is illustrated in Figure 1. Almost all the trills in my data consist of two periods of vibration. The trill realization of the initial /r/ is illustrated in Figure 1.

Figure 19 – Spectrogram of *[r]áha\_* 'joy, happiness'. Figure 1 – Spectrogram of **[r]**áhḁ 'joy, happiness'.

(8) Before [i] **[\_]**iyáli'money'

such as the delivery or the style of speech), never as a trill. (10) [\_]i**[\_]**á\_ganya\_ 'we were destroyed'

> ma**[\_]**ávu\_ 'cheeks' ma**[\_]**á(\_g)o\_ 'pumpkins' mi**[\_]**ú(n)\_a\_ 'orange tree' ma**[\_]**ú(m)\_o\_ 'insides'

b).

Figure 2.

Before [i], however, the trill (almost) never appears. Most of the time, a tap [\_] (sometimes an approximant [\_]) is realized (8). Before [i], however, the trill (almost) never appears. Most of the time, a tap [ɾ] (sometimes an approximant [ɹ]) is realized (8).


In intervocalic position, /r/ usually emerges as a tap [\_] (10), sometimes as an approximant [\_] (the two sounds seem to be in free distribution, perhaps partly depending on parameters

116

The tap realization of the intervocalic /r/, which consists of a single closure, is illustrated in

<sup>(9)</sup> Before [e] a. **[r]**ehéma 'blessing' b. **[\_]**e\_gá 'take!' **[\_]**e\_éi 'come back!' *3.2 /r/ in intervocalic position* <sup>2</sup> It is not clear to what extent the words in (7) are of Arabic or Bantu origin. The interdental [ð] indicates that ráði 'blessing' most probably comes from Arabic (according to researchers such as Ahmed-Chamanga 2010:22, interdental and velar fricatives only appear in Arabic loanwords). One reviewer has suggested that *rú*ŋ*ga* 'pitch' may come from the Proto-Bantu \*tung- 'to sew, thread', or \*túng 'to tie up'; I cannot offer a better hypothesis. *Róho* 'heart' may correspond to the Proto-Bantu \*jòjò 'heart, life' (Tervuren BLR3 database [http://www.africamuseum.be/collections/browsecollections/humansciences/blr]).

Before [e], the trill is possible (9-a), though not frequent: [e] is usually preceded by a tap (9-b).


#### 3.2 /r/ in intervocalic position

In intervocalic position, /r/ usually emerges as a tap [ɾ] (10), sometimes as an approximant [ɹ] (the two sounds seem to be in free distribution, perhaps partly depending on parameters such as the delivery or the style of speech), never as a trill.


The tap realization of the intervocalic /r/, which consists of a single closure, is illustrated in Figure 2.

Figure 20 – Spectrogram of *ma[\_]á(\_g)o\_* 'pumpkins'. Figure 2 – Spectrogram of ma**[ɾ]**á(ŋg)o̥'pumpkins'.

However, some rare items in my corpus, generally the first members of sequences of

When /r/ appears between two vowels that belong to two different prosodic words belonging to the same prosodic phrase, the tap is selected (12-14). This is true no matter which vowels

The tap realization of the intervocalic /r/ occurring at a word boundary is illustrated in

iterations (or when the speakers overarticulate), involve a trill in this position.

are involved, and regardless of whether the words are of Bantu or Arabic origin.

117

The tap realization occurs before and after all vowels and especially, as expected, before (11) mi**[\_]**éma 'cultivated field' The tap realization occurs before and after all vowels and especially, as expected, before front vowels (11).

ma**[\_]**ín\_(i) 'banana tree'

*3.3 Intervocalic position across a word boundary*

(12) ( ts(i)onó **[\_]**ah(a) )*\_* 'I saw joy'

position, one may wonder how a word-initial /r/ would emerge.

cf. **[r]**áha\_ 'joy, happiness'

(13) ( tsimbá **[\_]**a\_\_(i) )*\_* 'I gave him (a) blessing' cf. **[r]**á\_i\_ 'blessing' (14) ( (\_)gamniko **[\_]**úhusa\_ )*\_* 'I gave permission' cf. **[r]**uhúsa 'permission'

front vowels (11).

Figure 3.

(11) mi**[ɾ]**éma 'cultivated field' ma**[ɾ]**ínɗ(i) 'banana tree'

However, some rare items in my corpus, generally the first members of sequences of iterations (or when the speakers overarticulate), involve a trill in this position.

#### 3.3 Intervocalic position across a word boundary

Since the /r/ appears as a trill in the absolute initial position and as a tap in intervocalic position, one may wonder how a word-initial /r/ would emerge. When /r/ appears between two vowels that belong to two different prosodic words belonging to the same prosodic phrase, the tap is selected (12-14). This is true no matter which vowels are involved, and regardless of whether the words are of Bantu or Arabic origin.


The tap realization of the intervocalic /r/ occurring at a word boundary is illustrated in Figure 3.

At the beginning of a non-initial prosodic phrase, /r/ also emerges as a tap (15).

The distribution of rhotics in Washili Shingazidja thus seems to be quite simple: the trill realization is restricted to the absolute initial position, except before front vowels, while the tap (or its approximant variant) is selected in intervocalic position, whenever the intervocalic position occurs inside the word or between two words. In Section 4, I will show that the

In this section, I discuss the distribution of rhotics in Washili Shingazidja as a function of the presence of consonants before /r/ and the absence of high tones on a following vowel when /r/ occurs in word-initial position. In the former case, discussed in Section 4.1, a trill is selected. In the latter, discussed in section 4.2, /r/ may be realized as a tap, or even be deleted.

Washili Shingazidja clearly differs from the other Shingazidja varieties in the behavior of /r/

In the variety of Shingazidja that is spoken in Moroni, and to a lesser extent in other varieties, the sequence /\_/d + r/ emerges as a retroflex [\_] ('lies' is realized *n\_a\_o\_* in Moroni,

(15) [ ( n\_a=mí )*\_* ( na=wé )*\_* ]I [ ( **[\_]**en\_e\_z'=[é] n\_ei\_ )*\_* ]I stab=1sg and=2g 1pl(pas)=raise=the prices

after a consonant. In this situation, /r/ emerges as a trill (16, 17).51

n\_**[r]**óvi\_\_ 'banana(s)' b. m-\_e[\_]é n-\_**[r]**a[\_\_](u) 'three rings'

m\_**[r]**ám\_uwa\_ 'you (pl) recognized'

The trill realization of the /r/ occurring after a consonant is illustrated in Figure 4.

fact that the implosion come from post-nasal fortition – cf. n-\_ra\_\_u 'cl.10-three' vs. mi-\_a\_\_u 'cl.4-three'.

(16) a. n\_**[r]**avu\_ 'branch(es)'

10-ring 10-three (17) m**[r]**áha\_ 'game (specific)'

'that's I and you who raised the prices'

picture is a bit more complicated.

**4. Complications**

*4.1 After a consonant*

*n\_ra\_o\_* in Washili).

118

<sup>51</sup> I have no evidence for or against an analysis where [\_r] synchronically corresponds to a single affricate consonant, a hypothesis suggested by a reviewer. This idea receives support from Proto-Bantu (WS m-\_ri\_ 'tree' < PB \*mu-ti) and the

Figure 21 – Spectrogram of *( tsimbá [\_]a\_\_(i) )\_* 'I gave him (a) blessing'. Figure 3 – Spectrogram of ( tsimbá **[ɾ]**að̥(i) )ɸ'I gave him (a) blessing'.

At the beginning of a non-initial prosodic phrase, /r/ also emerges as a tap (15).

 (15)[ ( nɗa=mí )ɸ ( na=wé )ɸ ]I [ ( **[ɾ]**enʤẹz'=[é] nɗei̥ )ɸ ]I stab=1sg and=2g 1pl(pas). raise=the prices 'that's I and you who raised the prices'

The distribution of rhotics in Washili Shingazidja thus seems to be quite simple: the trill realization is restricted to the absolute initial position, except before front vowels, while the tap (or its approximant variant) is selected in intervocalic position, whenever the intervocalic position occurs inside the word or between two words. In Section 4, I will show that the picture is a bit more complicated.

#### 4. Complications

In this section, I discuss the distribution of rhotics in Washili Shingazidja as a function of the presence of consonants before /r/ and the absence of high tones on a following vowel when /r/ occurs in word-initial position. In the former case, discussed in Section 4.1, a trill is selected. In the latter, discussed in section 4.2, /r/ may be realized as a tap, or even be deleted.

#### 4.1 After a consonant

Washili Shingazidja clearly differs from the other Shingazidja varieties in the behavior of /r/ after a consonant. In this situation, /r/ emerges as a trill (16, 17).3


In the variety of Shingazidja that is spoken in Moroni, and to a lesser extent in other varieties, the sequence /ɗ/d + r/ emerges as a retroflex [ɖ] ('lies' is realized *n*ɖ*a*ɓ*o*̥ in Moroni, *n*ɖ*ra*ɓ*o*̥ in Washili).

The trill realization of the /r/ occurring after a consonant is illustrated in Figure 4.

<sup>3</sup> I have no evidence for or against an analysis where [ɗr] synchronically corresponds to a single affricate consonant, a hypothesis suggested by a reviewer. This idea receives support from Proto-Bantu (WS m-ɗrí 'tree' < PB \*mu-ti) and the fact that the implosion come from post-nasal fortition – cf. n-ɗráɾu 'cl.10-three' vs. mi-ɾáɾu 'cl.4-three'.

Figure 22 – Spectrogram of *m[r]áha\_* 'game (specific)'. Figure 4 – Spectrogram of m**[r]**áhḁ ̥'game (specific)'.

**b \_]**.

Interestingly, the post-consonantic rhotic also emerges as a trill when the vowel that follows (18) m\_\_**[r]**í 'tree' Interestingly, the post-consonantic rhotic also emerges as a trill when the vowel that follows is [i] (18).

Between [m] and [i], the rhotic does not appear as a single tap either, since two closures are (18) m̩ ɗ**[r]**í 'tree'

(20) a. m-[r]ó 'river' vs. mi-[\_]ó 'rivers'

time, /r/ then appears as an approximant (22), and can even be deleted.

nouns belong to classes 1, 3, 5, 7, etc. while plural nouns belong to classes 2, 4, 6, 8, etc.

(22) **[\_]**ahísi\_ 'inexpensive'

clearly perceptible. /r/, in this case, is realized as **[**

Pitch (Hz)

is [i] (18).

(19) m\_**[**

**b**

m\_**[ b**

*4.2 /r/ and tones*

**\_]**íma\_ 'African (coast)' **\_]**ísiza\_ 'you (pl) frightened' Between [m] and [i], the rhotic does not appear as a single tap either, since two closures are clearly perceptible. /r/, in this case, is realized as **[b ɾ]**.

In classes 352 (or 1) and 4, there is thus an alternation between [\_], occurring in class 4 (between two vowels, the class 4 prefix being *mi-,* the class 2 one *wa-*), and [r], occurring in classes 1 and 3 (the prefix in classes 1 and 3 is *m-*) (20). (19) m̩ **[b ɾ]**ímḁ 'African (coast)' m̩ **[b ɾ]**ísizḁ 'you (pl) frightened'

These alternations also occur after the 1st and 2nd plural prefixes of the past (perfective) (21).

119

53 In Shingazidja, the presence of a tone may also prevent vowels from gliding and from deletion (Patin 2009).

<sup>52</sup> Like the other Bantu languages (See Katamba 2003 for details), Shingazidja has a gender class system, where singular

b. m-[r]éma 'field' vs. mi-[\_]éma 'fields' c. m-[r]ún\_a 'orange tree' vs. mi-[\_]ún\_a 'orange trees' In classes 34 (or 1) and 4, there is thus an alternation between [ɾ], occurring in class 4 (between two vowels, the class 4 prefix being *mi-,* the class 2 one *wa-*), and [r], occurring in classes 1 and 3 (the prefix in classes 1 and 3 is *m-*) (20).


In absolute initial position, when the vowel that follows /r/ is not associated with a high tone, there is usually no trill (there are some rare exceptions in my corpus)53. Most of the These alternations also occur after the 1st and 2nd plural prefixes of the past (perfective) (21).

**<sup>[</sup>\_]**a\_gú 'from' **[\_]**aíl\_i\_ 'as long as' **[\_]**uká 'jump!' <sup>4</sup> Like the other Bantu languages (see Katamba 2003 for details), Shingazidja has a gender class system, where singular nouns belong to classes 1, 3, 5, 7, etc. while plural nouns belong to classes 2, 4, 6, 8, etc.


#### 4.2 /r/ and tones

In absolute initial position, when the vowel that follows /r/ is not associated with a high tone, there is usually no trill (there are some rare exceptions in my corpus)<sup>5</sup> . Most of the time, /r/ then appears as an approximant (22), and can even be deleted.


When the /r/ occurs in intervocalic position, however, the tone does not play a role – compare (23-a) to (23-b).


#### 5. Analysis

#### 5.1 Background

Allophonic situations where a trill appears in the initial position and a tap (or a flap) occurs in the intervocalic position are far from rare (see Bradley 2001; Inouye 1995; Lindau 1985; Recasens 1991; Walsh Dickey 1997; Wiese 2001, 2011, among many others). This is for instance the case in Romanian (Chitoran 2001), Northern Italian (Recasens 2002) and Farsi ("In Farsi, /r/, which is a trill in initial position6 , has a tap allophone in intervocalic position and a voiceless trill variant in word-final position" Ladefoged & Maddieson 1996:216). However, such a distribution, to my knowledge, has never been identified in any Bantu language. Nevertheless, according to Gérard Philippson (*personal communication*), it also appears in Chaga (Tanzania), a language that possesses another phonemic rhotic. In Davey et al. (1982), a paper concerning liquids in Chaga, this distribution is not explicitly discussed. However, the figures that illustrate the paper seem to attest its existence.

<sup>5</sup> In Shingazidja, the presence of a tone may also prevent vowels from gliding and from deletion (Patin 2009).

<sup>6</sup> The trill is voiceless in this position according to Majidi (1986, mentioned by Wiese 2011).

#### 5.2 Analysis

How can we account for the allophonic distribution of /r/ in Washili Shingazidja? I will assume that the trill realization that occurs in the absolute initial position corresponds to a geminate association (25).

I adopt in (24) a CVCV representation. The idea behind CVCV (Lowenstamm 1996; Scheer 2004), a theory that emerges from Government Phonology (Kaye et al. 1985, 1990), is that constituent structure can be reduced to a strict sequence of non-branching Onsets and non-branching Nuclei.

Three arguments support the structure in (24). First, it must be noted that many words involving a trill in the initial position are Arabic loanwords (25).


A geminate, in Arabic, results from assimilation of the article *al* when it is followed by 'r' (Classical Arabic: \*al-raᵓs ⟶ ar-raᵓs 'the head', Alfozan 1989).

(26) a**[r]**uħ < (a)l-ɾuħ 'the soul'7

One could suggest that combinations of *(a)l*+word or the forms where a geminate appears were borrowed8 . An argument supporting this idea is the fact that several French loans were borrowed *with* the definite article.

<sup>7</sup> Rachid Ridouane, *personal communication*.

<sup>8</sup> If this idea is borne out, one might expect similar effects to be observed at least on some of the other consonants that are implicated in the assimilation process that involve the article in Arabic, especially the other coronals (/s/, /n/, etc.).A systematic exam of the Arabic loanwords, which has not yet been conducted, is thus necessary. Thanks to Jean-Marc Beltzung for this suggestion.


The result would involve an initial CV site9 in the first position of the prosodic word (28).

Washili Shingazidja, in this respect, would be more conservative than other varieties, where the trill is rare in initial position.

The second argument that supports the structure in (24) is the fact that the initial trill cannot be considered the fortis counterpart of the tap. Fortition involves a (voiceless) retroflex, which is generally associated with clear friction in Washili Shingazidja: [ʈ ʂ ]. In classes 5 and 6, there is an alternation between [ɾ], which appears in class 6 (between two vowels, the class 4 prefix being *ma-*), and [ʈ ʂ ], which appears in class 5 (the main class 5 allomorph being *Ø-*).


Other alternations in this gender involve the transformation of [β] (class 6) to [p] (class 5) (30a), [h] (class 6) to [k] (class 5) or [l] (class 6) to [ɖ] (class 5) (30b).

<sup>9</sup> A *lexical* one, distinct from that which was proposed in Lowenstamm (1999).

(30) a. Ø*-*[p]áha 'cat (class 5)' vs. ma-[β]áha 'cats (class 6)' b. Ø*-*[ɖ]íŋgo 'back (class 5)' vs. ma-[l]íŋgo 'backs (class 6)'

In the CVCV framework, it will be assumed that fortition here involves an initial empty 'CV-' slot (a hypothesis originally discussed in Mohamed-Soyir 2005).

 **<sup>G</sup>** (31) C V - C V C V r a v u [ʈ ʂ ]

In (31), the second V slot governs and thus strengthens the first V position, leading to the fortition of the second consonant slot. The same configuration, usually referred to as *coda-mirror*, also explains the fortition of a post-coda consonant, according to Ségéral & Scheer (2001, 2005, 2008, among others). Gemination, if the analysis is retained, would thus differ from fortition in Shingazidja.

The final argument in favor of the analysis derives from the synchronic alternation between casual and formal speech. Consider, for instance, the verbal form in (32).

(32) [ɾ]í**[**ɾ**]**i̥ 'we played / we feared'

 In casual speech, the first vowel can be dropped, leading to the realization of a trill (33).

(33) a. **[r]**í mpi[ɹ̥]á 'we played a game' b. ( nɗa=mí )ɸ ( na=wé )ɸ ( **[r]**í mh̥u̥ )<sup>ɸ</sup> '(that's) me and you who feared God'

The trill realization in (33-a) is illustrated in Figure 5.

The trill realization in (33-a) is illustrated in Figure 5.

(32) [\_]í**[\_]**i\_ 'we played / we feared'

from fortition in Shingazidja.

In (31), the second V slot governs and thus strengthens the first V position, leading to the fortition of the second consonant slot. The same configuration, usually referred to as *codamirror*, also explains the fortition of a post-coda consonant, according to Ségéral & Scheer (2001, 2005, 2008, among others). Gemination, if the analysis is retained, would thus differ

The final argument in favor of the analysis derives from the synchronic alternation between

In casual speech, the first vowel can be dropped, leading to the realization of a trill (33).

casual and formal speech. Consider, for instance, the verbal form in (32).

'(that's) me and you who feared God'

Figure 5 – Spectrogram of *([\_])[r]í mpi[\_\_]á* 'we played a game'. Figure 5 – Spectrogram of ([ɾ])[r]í mpi[ɹ̥]á 'we played a game'.

In intervocalic position, /r/ is not able to spread through a vocalic association (34). Since it is

not associated with two skeletal positions, it cannot emerge as a trill. In intervocalic position, /r/ is not able to spread through a vocalic association (34). Since it is not associated with two skeletal positions, it cannot emerge as a trill.

122 Scheer 2009: '[…] consonants can geminate after floating, but not stable consonants'.). (35) C V C V C V Because it is governed by the following vowel in the second version of the 'codamirror' representation (Scheer & Ziková 2011), the consonant cannot undergo fortition either.

#### 5.3 Remaining questions

I propose that the trill realization that appears after a consonant results from a joint association to the first C slot. Such a representation is however excluded by the model – Scheer (2009): "[…] consonants can geminate after floating, but not stable consonants").

(34)

Double association of segments to a single C position is indeed problematic, since it would make the model to be too powerful. However, the idea is still worth considering, since it may account for situations such as the distinction between prenasalized consonants, where two consonant elements would be associated with a single slot, and N+C sequences, where the same consonant elements would be linked to two different slots. This hypothesis, however, needs deeper examination if it is to be retained.

CVCV representations cannot account for the tap realization before [i], nor can they explain why a trill can emerge between a consonant and [i]. As for the absence of the trill before a front vowel in initial position, it should be noted that the trill is characterized by a contact between the tip of the tongue and the alveolar area. Such a configuration hardly corresponds to the position of the tongue through the production of [i]. Other Bantu languages do not allow the sequence [ri], e.g. Simakonde (Sophie Manus, *personal communication)*, 10 and several studies have discussed the (relative) incompatibility of trills with (high) front vowels and/or palatalization: among others, Hall & Hamann (2009); Kavitskaya (1997); Żygis (2005). Recasens (2002:346), for instance, claims that "the occasional simplification of the trill before a high front vowel is rather associated with the difficulty involved in performing two successive antagonistic tongue dorsum gestures, i.e. tongue dorsum lowering and retraction for [r] and tongue dorsum raising and fronting for [i]".

This explanation alone fails to account for the emergence of the trill before [i] when the rhotic follows another consonant. The energy provided by the closure may explain this distinction11.

#### 6. Conclusions

This paper is the first discussion of the allophonic variation of /r/ in the Washili variety of Shingazidja, a Bantu language. It is claimed that /r/ has two main allophones: (i) a trill [r], which occurs in the absolute initial position – whenever the following vowel bears a high tone – and after a consonant; and (ii) a tap [ɾ], which is selected in intervocalic position, within a word and between two words. A CVCV analysis of this distribution has been sketched out, claiming that the trill allophone is underlyingly a geminate. However, the CVCV analysis fails to

<sup>10</sup> A reviewer has pointed out that "there is an occurrence of [d] before [i] vs. a liquid before other vowels in Kikongo and Sotho-Tswana, and that there is dialectal documentation showing an [r]".

<sup>11</sup> Suggestion from Bernard Gautheron, *personal communication*.

account for some facts, such as the alternation of /r/ before front vowels. Further investigations, including a deeper articulatory analysis, are required.

#### Acknowledgements

For helpful discussion of several aspects of this work, I wish to thank Jean-Marc Beltzung, Kassim Mohamed-Soyir, Kathleen O'Connor, two anonymous reviewers and audiences at RFP 2011 and *'r-atics-3*. Many thanks to my informant Said Mohamed: this work could not have be completed without his help.

#### References


## Prosodic factors in the adaptation of Hebrew rhotics in loanwords from English

#### Evan-Gary Cohen, Tel-Aviv University

#### Abstract

The behaviour of rhotics in (Modern) Hebrew loanwords from English differs from that of all other consonants. Rhotics metathesise, and words containing rhotics show a preference for pseudo-reduplicative structures. Within an Optimality Theoretical framework, I argue that this unique behaviour results from the interaction among various universal well-formedness constraints, whose effect is unattested in native Hebrew grammar. This is evidence of the role of phonological universals in adult grammars.

#### 1. Introduction

This paper focuses on Hebrew rhotics in loanwords from English. The aberrant behaviour of rhotics in adaptation, exhibiting phenomena such as metathesis and reduplication, is explained by appealing to the role of universal well-formedness constraints on syllable and word structure. Crucially, the application of these constraints is not supported by the native Hebrew grammar, and is, I argue, evidence of the role of phonological universals in adult grammars.

#### 1.1 Basic assumptions

Grammatical principles operating in a language logically come from one of two sources: (a) the native grammar, or (b) universal principles (UG).

I assume that the lexicon is divided into strata (Itô & Mester 1999) or has a core-periphery structure (Paradis & LaCharité 1997). Such a structure allows for variable grammars within the lexicon. There are productive principles in the lexicon's periphery (e.g. loanwords, acronyms) which might not apply systematically to the native lexicon (Kenstowicz 2003; Shinohara 2004; Berent et al. 2009; Cohen 2011 inter alia). This may be evidence that we can and do access UG when the effects of L1 grammar are weakened. The emergence of such universal principles in the lexicon's periphery is known as The Emergence of the Unmarked (TETU, McCarthy & Prince 1994).

#### 1.2 Goals

The goal of this paper is twofold. First, I demonstrate that the non-native metathesis and reduplication of Hebrew rhotics in loanwords is systematic, i.e. subject to a grammatical system. Second, via the analysis of prosodic phenomena involving rhotics, I support an approach advocating the universal motivation of the rhotics' behaviour. UG may apply even in what appear to be stable grammatical systems, especially in the lexical periphery of such systems.

#### 1.3 Structure of paper

In §2, an overview of metathesis and reduplication in Hebrew is provided. This is followed in §3 by data displaying the behaviour of rhotics in loanwords. A formal analysis within an Optimality Theoretical framework in the subsequent §4 is followed by concluding remarks in §5.

#### 2. Metathesis and reduplication in the native Hebrew lexicon

The following section is an overview of the native Hebrew grammar, particularly with respect to metathesis and reduplication. I argue that the behaviour of rhotics in loanwords cannot be supported by this native grammar.

#### 2.1 General facts about Hebrew

#### 2.1.1 Rhotics

There is one rhotic in the native Hebrew inventory, [ʁ̞] (henceforth: ʁ), a uvular approximant with certain frication (Bolozky & Kreitman 2007). The precise manner of articulation is usually determined by prosody, with onsets displaying more frication.

#### 2.1.2 Syllable structure

Native Hebrew words have the following syllable structures


Table 1 – Syllable structure in native Hebrew words.

Complex margins are noticeably rare in native Hebrew words, with complex onsets appearing only word initially, and complex codas appearing only word finally in 2nd person feminine singular past. All complex edges respect the Sonority Sequencing Generalisation (SSG; Steriade 1982) allowing sonority rises and plateaus towards the vocalic nuclei, but never sonority falls (Bolozky 1978; Bat-El 1994).

Loanwords, however, have a richer syllabic inventory (Bat-El 1994; Schwarzwald 2002). Tri-consonantal sequences may appear if they respect the SSG and do not have sonorant clusters (e.g. *stʁuk.*̍*tu.ʁa* 'structure', *tekst* 'text').

#### 2.2 Reduplication in Hebrew

There is productive *morphologically*-motivated reduplication in the Hebrew lexicon (Bat-El 2006).

First of all, reduplication is invoked in template (*binyan*) satisfaction. All verbs in Hebrew are subject to templatic restrictions imposed by one of the *binyanim*. Novel verbs are almost invariably formed within the *pi'el* template, a disyllabic *binyan* with a XiXeX vocalic pattern (e.g. *tsad* 'a side' *tsided* 'to side with' ; *daf* 'a page' *difdef* 'to page through' (Bat-El 1994; McCarthy & Prince 1995; Gafos 1998; Ussishkin 2000).

In addition to template satisfaction, reduplication is a means of lexical expansion, the addition of new lexical items which are semantically similar to existing items (e.g. *iʃeʁ* 'to confirm' vs. *iʃʁeʁ* 'to ratify').

Finally, diminutives in nominals may be formed by reduplication (e.g. *dag/dagig* 'fish/small fish'; *xaziʁ/xazaʁziʁ* 'pig/piglet'; *kaxol/kxalxal* 'blue/bluish').

To sum up, reduplication is a derivational process in Hebrew, a strategy of stem formation whose purpose is to form different yet semantically related words. It is (almost invariably) at the word's right edge, and invariably forms prosodic structures already available as unreduplicated forms, unmarked (C)VC syllables, avoiding the creation of clusters (see discussion in §2.1.2 and §4.3 regarding clusters).

#### 2.3 Metathesis in Hebrew

In Hebrew, metathesis does not occur systematically, except for a single instance, strident-initial stems after the *hit-* prefix of the *hitpa'el binyan* (Schwarzwald 2002):


The stem-initial strident metathesises with the prefix-final *t*. This process is restricted to stem-initial stridents in *hitpa'el*.

An additional case in which sporadic cases of metathesis are found in Hebrew is during acquisition, where universal principles are known to surface (Berent et al. 2009), often overriding native grammar. In some cases, metathesis is found, specifically to avoid complex codas, preferring complex onsets to them. For example, the adult forms *disk* 'disk' and *tost* 'toast' may be produced as *sdik* and *stot* respectively.

#### 2.4 Summary

Both reduplication and metathesis do occur in Hebrew. However, reduplication is morphologically restricted to lexicon expansion. Metathesis is not only morphologically restricted to the *hitpa'el binyan*, but is also segmentally restricted to stridents. Neither of the two processes is segmentally restricted to or unique to rhotics.

## 3. Rhotics in Adaptation

#### 3.1 Segmental adaptation

The segmental adaptation of rhotics, remarkably straightforward, is not relevant to the discussion of metathesis and reduplication. Cohen (2010) presents a 1383-word Hebrew loanword corpus, constructed from three different sources: elicitation from native speakers, spontaneous productions and previous publications on Hebrew loanwords. In this corpus, English rhotics are invariably adapted into Hebrew as the native rhotic ʁ. Note, many of the words in the corpus do not originate in English, however, they entered Hebrew via English mediation. Therefore, I generally refer to the words as loanwords from English. Word-final rhotics from non-rhotic English dialects with no input surface rhotic also surface as *ʁ* (e.g. British English *̍ɑft*ə 'after (military term)' Hebrew *̍afteʁ*). The similarity-based phoneme mapping in the adaptation of rhotics into Hebrew has multiple sources in English, both perceptual (Lindau 1985; Ladefoged & Maddieson 1996; Magnuson 2007) and orthographic (Vendelin & Peperkamp 2006; Escudero et al. 2008), which may even provide conflicting evidence (Smith 2005; Cohen 2010:137).

#### 3.2 Prosodic phenomena in adaptation

Prosodic, rather than segmental, phenomena, restricted to rhotics, are at the focal point of this paper. In the realm of consonant adaptation in Hebrew loanwords, the behaviour of the rhotics is unique, as other consonants are ordinarily adapted 1-to-1 to the closest native category, with no prosodic modification.

The only instances in which there is some prosodic modification are: (a) deletion, to avoid complex syllable margins (e.g. ɪ*gz*ɔ*st* 'exhaust' *egzoz* ; *hændbɹeɪks*  'handbreaks' *ambʁeks*) and (b) epenthesis, to avoid certain complex codas (e.g. *fɪlm* 'film' *filim*).

There are also two additional prosodic phenomena, both of which are unique in adaptation to rhotics: (a) neither is native to Hebrew grammar and (b) both are optional (variation among speakers and lexical items), but when they occur, they occur systematically. In addition to the corpus mentioned in §3.1, most of the examples in this paper were collected from speaker productions, both in conversation and in the media.

#### 3.2.1 Metathesis

*ʁ* is metathesised from coda into onset position:


Table 3 – ʁ-metathesis in Hebrew loanwords from English.

Note, as mentioned in §3.1, some of the above words do not originate in English (e.g. *goʁgonzola*, *maskaʁpone*), however, they entered Hebrew via the English form, rather than directly from the source language (in these cases, Italian). The process is optional in colloquial Hebrew.

#### 3.2.2 Reduplication

*ʁ* is metathesised from onset into coda position, creating a reduplication-like structure (Zuraw 2002). Henceforth, I will refer to these cases as pseudoreduplication1 :


Table 4 – Pseudo-reduplication in forms with rhotics.

<sup>1</sup> In a single instance in loanwords, there is even ʁ-epenthesis, which creates a pseudo-reduplicated form: *dio*ʊ*d*əɹə*nt* 'deodorant' *doʁdo*ʁ*ant*. A similar process is found in very few native Hebrew words, such as *<sup>ʃ</sup>fofeʁet* 'telephone receiver' *foʁfeʁet*.

While all the above forms in Tables 3 and 4 optionally undergo the processes mentioned, there are several forms where nothing happens. I do not account for this variation (or lack thereof ) in this paper. The following data are some words in which none of the processes under discussion occurs:


Table 5 – Non-varying forms.

#### 4. Analysis

#### 4.1 Theoretical background

The underlying segmental representation undergoes modification resulting from constraint interaction (Optimality Theory, OT, Prince & Smolensky 1993/2004). Two types of constraint interact with one another: (a) *faithfulness* constraints requiring input-output correspondence and input string sequences to be preserved, e.g. constraints militating against deletion or metathesis, and (b) *markedness* constraints requiring surface forms to comply with universal preferences, which may force metathesis and pseudo-reduplication.

In OT, each given input has several possible outputs. Each possible output is evaluated by the language's grammar, which defines how *bad* each candidate is (there are no *good* candidates because no candidate is perfect, as all violate some constraint). The least bad candidate in a given set of candidates is the most harmonic candidate, the optimal candidate. The possible outputs are evaluated within a constraint based system in which the constraints are ranked and candidates are eliminated by evaluating them against the highest ranked constraint and then the other constraints in descending order until all but one candidate are eliminated. The remaining candidate is the optimal one and the selected output.

In addition to an OT grammar, I also assume that the lexicon is *stratified* (§1.1). Faithfulness constraints relevant to loanwords (and so indexed) may be ranked differently with respect to markedness constraints than faithfulness constraints pertaining to native words (Itô & Mester 1999).

#### 4.2 Metathesis

Why should metathesis occur at all? The English rhotic has long-range acoustic resonances (Kelly & Local 1986; Hall 2009). These result in rhotics being perceived where they are not actually present, and epenthesised due to this 'misperception' (e.g. *ʃɜɹbɪt ʃɜɹbɜɹt '*sherbet'). Because of this resonance, listeners perceive the rhotics but are not necessarily aware of their input string position. Generally speaking, the Hebrew rhotic *ʁ* is acoustically 'weak'. Specifically, it is even weaker in coda position, more so than other consonants. Therefore, it 'favours' syllable onsets, which are more perceptible than the codas. This being said, rhotics perceived in whatever position preferably surface in the onset position, if possible.

These observations can be translated into a formal OT grammar. The rhotic in Hebrew loanwords from English is assigned a prosodic position which facilitates its optimal perception in Hebrew, driven by perceptibility constraints based on models such as in Steriade's (2001/2008) P-map. I assume the input to the grammar is the form as produced in English. I propose the constraint \*Coda-r:

#### **\*Coda-r: Rhotics are avoided in coda position.**

Note, although this constraint may be perceptually motivated, it seems to contradict the general notion that liquids are *good* codas in languages, as codas tend to be as sonorous as possible. Further cross-linguistic evidence for the proposed constraint is presented in §5.

In addition, there are faithfulness constraints militating against deletion or the change in the linear order of the input segments, such as Max and Linearityn/lw:

#### **Max: All input segments have correspondents in the output (i.e. don't delete segments).**

#### **Linearityn/lw: Preserve linear order of input segments in native(N)/ loanwords(LW) respectively (i.e. no metathesis)**.

Recall the metathesised forms in §3.2.1 (e.g. /end**oʁ**finim/ 'endorphins' *endʁofinim*). In native words, the faithfulness constraints LinearityN and Max are not violated. In loanwords, ordinarily, there is no reason to violate these constraints either, as all segments (except *ʁ*) are possible in codas. However, when the coda is a rhotic, the markedness constraint \*Coda-r is potentially violated. It is possible to avoid the violation of this markedness constraint by violating LinearityLW, metathesising the coda rhotic into onset position. However, this creates a complex syllable onset, violating a different markedness constraint:

#### **\*Cx – No complex syllable margins.**

Clearly, \*Coda-r outranks \*Cx (expressed as: \*Coda-r>>\*Cx), otherwise rhotics would never be metathesised out of coda position subsequently creating complex onsets. The grammar considers three primary candidates, each violating different constraints2 :


Table 6 – Evaluating candidates for /endoʁfinim/.

The fact that the grammar prefers *endʁofinim* is evidence that LinearityLW and \*Cx are the lowest ranked of the four constraints (no evidence of ranking between them – hence the dotted line indicating no crucial ranking). Since the potential violation of \*Coda-r is not satisfied by deletion, there is no evidence for any ranking between \*Coda-r and Max. However, both of these constraints are clearly more highly ranked than LinearityLW and \*Cx. Note, both Max and LinearityN are highly ranked in Hebrew, but while codas, in general, are disfavoured, the specific constraint \*Coda-r is not visible in the native lexicon, as it is dominated by LinearityN. This is where the notion of the Emergence of the Unmarked (TETU, §1.1) comes into play. Although native Hebrew words allow coda rhotics, providing no evidence for their universal markedness due to the high-ranking LinearityN, the unmarked structures without coda rhotics emerge in loanwords, as these are not subject to LinearityN, but rather to the lower ranked LinearityLW.

Since LinearityN preventing metathesis in native Hebrew nouns outranks \*Coda-r, the lexicon is effectively stratified into words which metathesise coda rhotics (loanwords) and those which do not (native).

<sup>2</sup> Theoretically, there are more than three candidates, but these are the most important.

#### 4.3 Pseudo-reduplication

What is the motivation for non-morphological reduplication? Reduplication results a string occurring twice in a single stem. Lexical representations of such forms are simpler than forms in which all segments are different (i.e. less lexical information) and the production of such forms is simpler, as it requires the repetition of articulatory motor sequences rather than introducing new sequences. This can be translated into an OT grammar. First of all, pseudo-reduplication (not the morphologically motivated type in §2.2) is motivated by constraints such as Bat-El's (2006) Copy, which requires strings to have two occurrences in stems, or Zuraw's (2002) Redup, requiring word-internal similar substrings. I adopt Zuraw's Redup.

#### **Redup – A word must contain some substrings that are coupled.**

Here, adjacent strings with identical vowels are assigned a reduplication-like structure via metathesis. Recall the constraints presented in Table 6 in §4.2: \*Coda-r, Max>>LinearityLW, \*Cx. The following tableau introduces the Redup constraint. I do not consider potential candidates which violate Max, and therefore, ignore Max (…) in the tableau. The grammar considers three primary candidates, evaluating them with the relevant constraints:


Table 7 – Evaluating candidates for /pʁopoʁtsija/.

The grammar proposed in Table 7 simply does not work! It selects the incorrect *pʁopʁotisja* (indicated by the A) rather than the actual winner *poʁpoʁtisja*  (indicated by ü). We expect the unattested /pʁopoʁtsija/ *pʁopʁotsija*, which satisfies Redup and respects \*Coda-r>>LinearityLW (i.e. avoids codas by metathesising the rhotic into onset position). We do not expect /pʁopoʁtsija/ *poʁpoʁtsija*, as although this satisfies Redup, it violates LinearityLW as badly as the unattested form, additionally incurring multiple violations of \*Coda-r. It appears that multiple violations of the low-ranked markedness constraint barring complex margins are considerably worse than a single violation, and to be avoided, even at the expense of a high-ranked constraint (in our case, \*Coda-r).

This is captured by the notion of constraint conjunction (Kirchner 1996; Moreton & Smolensky 2002), particularly that of self-conjunction (Itô & Mester 1998). Self-conjunction encapsulates the idea that multiple violations of a single constraint have a cumulative effect. While a single violation may be low-ranked in the overall scheme of things, multiple violations are more highly ranked. This is similar in effect to the notion of constraint weighting, whereby all constraints have values, rather than relative rankings, and values are cumulative (Pater 2009; Smolensky & Legendre 2006; Prince & Smolensky 1993/2004:236). I will not argue for either model, though adopt self-conjunction in my analysis. The following tableau demonstrates the application of the self-conjoined \*Cx-\*Cx (note, for simplicity's sake, Max and LinearityLW have been omitted (…) from the table):


Table 8 – Evaluating candidates for /pʁopoʁtsija/ with self-conjunction.

The self-conjoined \*Cx+\*Cx outranks \*Coda-r, thereby selecting the correct output, *pʁopʁotisja*. The low-ranked \*Cx has a cumulative effect expressed by the self-conjoined constraint. Multiple violations allow the effect of the markedness constraint \*Cx to surface.

This is an additional instance of TETU (§1.1). Although Hebrew does allow clusters in native words, it disfavours them, all things being equal. Specifically, clusters are barred in reduplicated elements, as this necessarily results in multiple violations of \*Cx. Incidentally, cluster avoidance in Hebrew is not unique to reduplication but appears in other peripheral word formation such as acronyms (Zadok 2002), where clusters are avoided. Specifically with respect to reduplicated forms, crosslinguistically, reduplicants tend to be more unmarked than their bases, avoiding clusters when possible, even when these exist in the base. For example, the Klamath distributive formation, *dje:mi de-dje:m-a* 'be hungry' (Steriade 1988:131).

#### 5. Discussion

The unique behaviour of rhotics in Hebrew loanwords from English can be explained within a formal, systematic grammar. Crucially, this grammar differs from the native grammar of Hebrew, i.e. there are parallel grammatical systems (core-periphery, lexical strata). This suggests that the grammar in certain (peripheral) areas of the lexicon (in our case, loanwords) may differ from the grammar in other strata in the lexicon.

Since the grammar under discussion is not supported by Hebrew grammar, its constraints have to be universally motivated, supporting the notion that adults have access to UG (as they couldn't derive the grammar of rhotics from the ambient language). So it appears that \*Coda-r, unmotivated by Hebrew grammar, must be part of UG. The question is whether this proposed constraint has any support.

Lindau (1985) states that crosslinguistically, rhotics tend to vocalise and even delete in coda position (post-vocalically), implying that coda positions disfavour rhotics. Evidence for this is found in several languages, such as English, Dutch, German, Swedish and numerous other languages, where coda rhotics may be deleted or vocalised (e.g. Itô & Mester 2001 for German coda rhotics).

An additional means of avoiding coda rhotics is the metathesis of rhotics over vowels, which is well supported via articulatory coordination with vocalic nuclei (Hoole et al. 2013). In Rumanian (Savu 2013), *gr*ə*din*ə '**g**a**r**den' is evidence of coda-nucleus metathesis, avoiding coda rhotics. In addition, in Rumanian loans from Slavic, syllabic rhotics are invariably adapted as complex onsets (e.g. Czech [**br**̩no] 'Brno' Rumanian [**brə**no]). In Ladino ( Judeo-Spanish), there is systematic rCCr metathesis in loanwords from other languages (e.g. Spanish *gordo*  Ladino *goðro* 'fat' ; Spanish *verde* 'green' Ladino *veðre*).

In the Sardinian dialect of Sestu Campidanian (Bolognesi 1998), coda rhotics are shifted into onset position when the root is preceded by a determiner (e.g. ˈ**or**ku 'ogre' vs. s:**ro**k:u 'the ogre')3 .

The behaviour of rhotics in Hebrew loanwords from English supports an approach by which adults have access to universal grammatical principles, which surface in the lexical periphery even when these are unsupported by native grammars (TETU). Universal Grammar may apply even in what appear to be stable grammatical systems, albeit in the lexical periphery of such systems.

<sup>3</sup> Note, Alber (2001) does not analyse this as metathesis from coda to onset position, as Bolognesi also provides examples of onset rhotics being metathesised into stem initial position.

#### Acknowledgements

I would like to thank the participants of the *'r-atics-3* conference in Bozen-Bolzano for their input regarding many of the ideas expressed herein. Special thanks to Outi Bat-El for her assistance in this research. The invaluable comments from two anonymous reviewers contributed considerably to the analyses in the paper. I take full responsibility for any shortcomings this paper may have.

#### References


# Part III

Language variation and change

## A preliminary contribution to the study of phonetic variation of /r/ in Italian and Italo-Romance

#### Antonio Romano, Università degli Studi di Torino

#### Abstract

This paper aims at giving the first contribution to the phonetic description of the different realisations of /r/ in the present-day Italo-Romance languages spoken in Italy. It discusses a selection of phonetic phenomena observed in current use from a descriptive point of view and which have been confirmed in most cases by experimental evidence.

Descriptions are based on a sample of a thousand *r*-realisations from different speakers (of different origins and with idiosyncratic phonetic properties) and are offered in terms of 'narrow phonetics'.

#### 1. Introduction1

In spite of the fact that the main sources of variability described in the Italian domain for these sounds are stylistic and – to a lesser extent – diatopic, very few details on them are generally given in sociolinguistic or dialectological studies. Behind the traditional dichotomy between an apical *r* vs. a uvular *r* (sometimes masked by general labels, which were used to describe quite different classes of sounds following the authors' impressions) stands a typological vagueness which characterises not only large diffusion books, but also part of the scientific literature.

A symptom of the different considerations connected with *r*-pronunciation is the disagreement on the sociolinguistic status accorded to some *r*-sounds in phonetic studies. Even when the authors agree on their articulatory description, different opinions on the prestige status of dialectal variants clearly reveal the incomplete (and, more often, non-uniform) knowledge of the geographical and

<sup>1</sup> This paper reproduces some contents of the communication presented at *´r-atics-2: 2nd International Workshop on the Sociolinguistic, Phonetic and Phonological Characteristics of /r/* (Université Libre de Bruxelles, 5-7 Dec. 2002).

social variability of these units within the Italian diasystem. By-passing the presence of a large number of interesting phenomena involving the realisations of /r/ and /rr/, significant emphasis is given to what is usually called *erre moscia* ('limp' or 'lifeless r') which is a simple way – as it has been underlined by some phoneticians (see Canepari 1979 and Mioni 1986) – to class together, in the same stereotyped category, more than ten phonetically different basic articulations.

For all these *r*-sounds, besides checking the supposed presence of uvular vibrations, we need a better articulatory description, including details about the place and the degree of constriction, the dynamics of vibrations (when really present), voicing properties, and so on.

This paper summarises the preliminary descriptive work I prepared in view of research I carried out in this domain from 1998 to 2003 and whose results showed that, besides a dependence on general patterns of temporal organisation, speakers have recourse to different strategies to obtain non-apical *r*-sounds by using the acoustic (and perceptual) effects of rapid changes in frequency patterns.

#### 2. Rhotics' variability: functional principles, articulatory strategies and acoustic cues

Rhotics are a broad class of speech sounds whose articulatory and acoustic properties are renowned to be particularly speaker- and language-dependent (Stevens 1989:40). They are basically associated with apical trills, usually described as the central members of this class, but an enormous variety of other sounds can be found in various languages and dialects.

While phonetic modelling reveals that an efficient tongue-tip vibration depends on airflow, impedance, and appropriate apical control (McGowan 1992:2903; Widdison 1997:191), apical trills are also described as articulatory gestures with narrower aerodynamic requirements than other sounds (Recasens 1991; Solé 1999). That could be a valid reason explaining why they usually undergo all kinds of variation and why they are interesting for sociolinguistics (Labov 1972; various papers in Van de Velde & Van Hout 2001).

In the literature, trills are described as extremely fine articulations:

"Learning to make a trill involves placing the tongue, very loosely, in exactly the right position so that it will be set in vibration by a current of air. [...] The problem experienced by most people who fail to make trills is that the blade of the tongue is too stiff " (Ladefoged 1993:169).

In the past decades, Barry (1997) and Catford (2001) reopened the classical debate on the historical evolution of *r*-sounds in different languages.

Well-known case studies have been traditionally represented by French and German, whereas nowadays many other languages, including Italian, show interesting social dynamics involving *r*-sounds.

For standard Italian, the phonological starting assumptions are that an apicoalveolar phoneme /r/ contrasts in intervocalic position, following the consonant gemination pattern generalised in the whole system, with a geminate counterpart /rr/, whereas in other languages such as *RP* English or 'normative' French only one rhotic phoneme is synchronically acknowledged, with realisations respectively described as an alveolar approximant and a uvular fricative or approximant (or even a trill in some varieties; Demolin 2001)1 . A functional view allows us to assume that the sounds that realise the two phonological units in Italian are, therefore in both cases, apical trills with a different number of contacts2 .

Nevertheless, trills are not just series of taps: they are quite different from taps in that the body of the tongue is subject to a higher degree of constraint during the production of a trill than of a tap (Recasens 1991; Kavitskaya 1997).

As discussed in the present paper, in a number of Italian idiolects single rhotics are not trilled (a distributional analysis of /r/ and /rr/ allophones is in Canepari 1979, 1999; see §3)3 . Acoustic cues associated with the articulatory properties of these allophones have been extensively analysed for the different languages where they are mainly attested (e.g. Meyer-Eppler 1959; Delattre 1966, 1971; Ladefoged et al. 1977; Hagiwara 1995; Schiller & Mooshammer 1995; Alwan et al. 1997).

With regard to their description in terms of timing, vibration frequencies and

<sup>1</sup> The real phonetic implementation of French rhotics is often disregarded in favour of a pretended diffusion of uvular trills. Ladefoged & Maddieson (1996:225) observe that "Uvular trills occur in some conservative varieties of Standard French and Standard German, although most speakers of these languages use uvular fricatives or approximants rather than trills". Results of a research (partially published in Billiez et al. 2002) which I presented at *´r-atics-2* accounted for a fricative/approximant as the more common realisation for French /ʁ/.

<sup>2</sup> As a general reference, see Ladefoged & Maddieson (1996:218-219): "[i]n Italian, single and geminate forms of most consonants contrast in intervocalic position" and that "[t]he single/geminate opposition also applies to trills". In repetitions of the words *caro* and *carro* by five speakers of Standard Italian, they found none of the intervocalic single trills to have more than two contacts while the geminate trills showed no fewer than three contacts and up to seven (Ladefoged & Maddieson 1996:221).

<sup>3</sup> In Romance languages, a distinction is usually made between 'polyvibrants' and 'monovibrants' without further defining different articulatory possibilities within these classes (this choice traditionally matches the two-way perceptual distinction proposed in Barry 1997:40 accounting for single-strike vs. multi-strike *r*sounds). In Canepari (1999:101) we may find a finer classification for monovibrant *r*-sounds in Italian, where they are distinguished in two categories: *vibrati* (taps) and *vibratili* (flaps). Previously Mioni (1986:45) defined taps as *battiti* 'beats', and flaps as *scatti* 'triggers'. Even though taps and flaps are not elsewhere generally distinguished in the literature (see Barry 1997:38), a clear distinction is proposed by Ladefoged & Maddieson (1996:232).

dynamic properties (as they appear at an acoustic insight) one can refer to Ladefoged & Maddieson (1996:218), observing closed and open phases in the order of 25 ms each4 .

Even the traditional timing for a single vibration in an intervocalic position is described in most varieties of Spanish and Italian as having a mean closure duration of about 20-25 ms (Vagges et al. 1975; Quilis 1981; Contini 1983; Recasens 1991)5 .

In their survey of Florentine students, Vagges et al. (1978:3) showed that 7 speakers out of 10 realised /r/ as a monovibrant, 2 as a flap and 1 as a "multivibrant"6 .

Concerning spectral distinctions between uvular and apical *r*-sounds, an interesting framework is provided for a number of languages in some traditional acoustic approaches (see for instance Jakobson 1957; Fant 1960, 1968; Delattre 1966, 1970)7 .

#### 3. Italian r-sounds

Concerning the actual status of Italian *r*-sounds, the literature is relatively poor. As partially introduced in §1, the main interest is devoted to *r*-variability in some geographical varieties and to the diffusion of defective variants known as *r moscia* to which, as far as I know, no instrumental study has been explicitly dedicated.

<sup>4</sup> Temporal characteristics of trills are detailed in reference to studies surveyed in Ladefoged & Maddieson (1996:226). Measures for the mean vibration rate for trills are in the range 26-30 Hz. Though anatomically very different, bilabial, apical and uvular trills vibrate at similar frequencies. Ladefoged et al. (1977) proposed an explanation based on the compensation of the difference between the masses involved by a decrease in the articulators' tenseness.

<sup>5</sup> In the examples given by Ladefoged & Maddieson (1996:231) for two tap realisations, the spectrograms show durations shorter than the mean duration I found in Italian single *r*'s by about 20 ms and 25 ms (see §3).

<sup>6</sup> Mean duration and standard deviation of 25 ± 18 ms are reported for intervocalic *r* in repetitions of one word. Similar values are reported by Contini (1983) in his acoustic analysis of Sardinian "constrictives à battements" whose realisations are single-strike, with a typical duration of 20-30 ms, or multi-strike, with 2÷5 interruptions of similar durations and an interrupted spectrum similar to the one of a central vowel (Contini 1983:414-415). In Vietti et al. (2009) 138 single-*r* postvocalic realisations are measured for speakers of 16 Italian cities in laboratory productions: a single-strike apical trill appeared in 38% of cases (a tap perhaps only in 6% of cases), with durations in the expected range (25 ± 6 ms). Another 5% are 'smoothed' taps, whereas 18.3% are 'broken' approximants and 6.7% are regular approximants; 7% was represented by single-strike apical trill with longer duration (31 ± 6 ms) with acoustic characteristics similar to those of a voiced stop. Velar, uvular and pharyngeal realisations (usually uvularised or pharyngealised alveolar taps) rank, mainly for northern speakers, up to 10%; a (somewhat lateralised) flap is dominant for Venice speakers (6%), while vowel rhotacism and *r*-deletion are limited to a residual 5% of cases.

<sup>7</sup> See details in Ladefoged & Maddieson (1996:226-231). In Romano (forthcoming), I support a general view of a vowel-colouring of *r*-sounds ([ə]-like when apical and [o]-like for back articulations) and suggest possible spectral dynamics for single-*r* variants.

A simplified description of the phonological Italian system basically assumes a phoneme /r/ and its geminate counterpart /rr/ whose phonetic realisations, as already discussed in §1, are both apical trills with a different number of contacts. In Canepari's traditional finer analysis, summarised in Canepari (1999:97-102), the phoneme /r/ is associated to both [r] and [ɾ] (the latter mainly occurring in unstressed syllables). A detailed distributional analysis is given in the following passage:

"[N]ella pronuncia neutra odierna effettiva abbiamo, normalmente [r] in sillaba accentata: [(C/V)ˈrV-,ˈCrV-, ˈVrːC(V),ˈV(ː)r#] (oppure, solo come variante occasionale, non sistematica, e non enfatica, [ɾ]). Mentre negli altri casi si ha [ɾ]: [ˈVːɾV, (V/C)(ˌ) ɾV-, Vɾ-, -ɾ(ˈ)C-] (oppure come variante possibile, specie per enfasi, [r]). Per /rr/ si ha: [ˈVrːɾV, VɾˈrV, (ˌ)VɾɾV, Vɾ(ˌ)ɾV] (oppure anche [rːr, rr], soprattutto per enfasi)" (Canepari 1999:97-98)8 .

Using an instrumental approach, I checked the examples proposed by Canepari (1999:328) (*raro* [ˈraːɾo] < /ˈraro/ 'rare', *parlare* [parˈlaːɾe] < / parˈlare/ 'to speak', *Mario* [ˈmaːɾjo] < /ˈmaːɾjo/, *carro* [ ̍karːɾo] < /karro/ 'cart', *Enrico* [enˈriːko] < / enˈriko/) which presented various phonotactic solutions. The speech sample came from the tape associated with Canepari's handbook and the speaker was a professional male speaker with no particularly evident regional traits. Waveforms and spectrograms are displayed in Fig. 1 with the help of *WASP* (1.02).

<sup>8</sup> An updated source is now provided by Canepari (2005).

Figure 1 – Waveforms and spectrograms (obtained with the program WASP, thanks to M. Huckvale, UCL) showing standard Italian pronunciation for /r/ in the words (by a professional male speaker, see Canepari 1999): (upper row) raro 'rare', parlare 'to speak', Mario '(person's name)'; (lower row) carro 'cart', Enrico'(person's name)'.

Taps appear only in the positions allowed by a phonetic reduction rule. Their realisation is restricted to the intervocalic unstressed position or to the 'explosive' phase of /rr/9 . Nevertheless they may have a closing phase longer than 50 ms which is slightly (and suspiciously) higher than the one usually measured for taps in other languages (see §2)10. Other /r/ realisations (such as the coda /r/ in the first unstressed syllable of *parlare* and the onset of the stressed syllable of *raro* and *Enrico*) are not single-strike sounds and are realised with a 2÷3-strike trill against the longer 5-strike trill realising /rr/.

<sup>9</sup> According to Rousselot (1913:53): "L'r double se comporte donc comme les autres consonne redoublées qui, doubles pour l'oreille, ne sont, au point de vue articulatoire, que des consonnes simples fortes et longues", but these sounds lead to a phonetic distinction: "La 1re *r* entendue est une *r* implosive ; la 2e, une *r* explosive" (ibid.). In agreement with Canepari's distributional scheme, as it is shown farther further in this work, this assumption for Italian, does not contrast in principles, with Inouye's (1995) generalisation of a phonetic length feature as for the relationship between trills and taps, which remains valid for languages without geminate/singleton contrasts.

<sup>10</sup> As shown by some extensive studies (Vagges et al. 1978; Ferrero et al. 1979), Italian apical *r*'s are resistant to coarticulatory effects of neighboring sounds (for apical trills in general, see Lindau 1985).

Regional varieties of Italian follow the same distribution, with intervocalic single rhotics realised as single-strike sounds11. On the basis of a number of items I analysed in spontaneous dialogues for different varieties (variously disposed to tap spreading in other phonotactic contexts, see Romano, forthcoming), I observed that single-strike *r*-sounds tend to preserve a higher tension in the vowel-to-consonant transition than the one usually accounted for languages described as tap-languages (cp. Vietti et al. 2009).

#### 4. (Not only) Back r-sounds in Italian

While the term *grasseyé* is nowadays used in French to refer to a variety of non-apical sounds, the general category for Italian *r*-sounds differing from the vibrant sounds described as standard is traditionally labelled *r moscia* 'limp or lifeless r'; (see §1).

As shown by some phoneticians (see Canepari 1979; Mioni 1986), limp or lifeless *r*-sounds are in reality quite different articulations which have been gathered in order to denote defective or snobbish pronunciations.

People using a different kind of *r* are euphemistically said to have a French *r* (*r alla francese* or, simply, *r francese*) even when these sounds have nothing to do with the French *r*-pronunciation. Other common expressions to indicate a burrer are just *ha la erre* '(s)he has the *r*' or, in some cases, *non ha la erre* '(s)he does not have the *r*'. In other cases the *r*-pronunciation is not lifeless at all (e.g. the case of long uvular trills) but the label *r moscia* is extended to them by some informants. On the other hand, I encountered the term of *r pizzicata* 'pinched *r*' which is also used in some regions with regard to this sustained but even 'different' pronunciation.

In the literature there is disagreement on the sociolinguistic status of such *r*-sounds because different opinions are expressed on the prestige status of idiolects which contain these sounds. This reveals an incomplete (non-uniform) knowledge of the geographical and social variability of this phenomenon in Italy. As is also remarked in Ladefoged & Maddieson (1996:226), Ladefoged et al. (1977) described a uvular trill appearing in Italian in a prestige dialect (but, since there is no clear-cut social differentiation for these sounds, idiolects must have been considered)12.

<sup>11</sup> Inouye (1995) demonstrated that intervocalic tapping of trills is widespread crosslinguistically (in this case only as realisations of a single consonant).

<sup>12</sup> Traditional dialects described as having uvular *r*-sounds are in northern areas (almost exclusively north-western dialects or in the bilingual areas in the North-East, on the boundary with German-speaking countries) but there is no particular reason to consider them as prestige dialects. Indeed, individual burrers have been identified everywhere in villages from North to South where specific burring styles are widespread and are sometimes promoted as markers of local socio-geographical identity.

Chambers & Trudgill (1998:191) write about a "uvular /r/ only in some educated speech" but even that description does not reflect the real Italian situation, where the usage of this kind of *r*-sound is still considered (as it was for French in the past centuries, see above) a pronunciation defect or, in some cases, a symptom of snobbery and affectation, more than 'education'.

In most cases the sounds labelled as *r mosce* are even considered as 'pathological' A similar position is expressed by Widdison (1997:189), who includes Italian back *r*-sounds among the cases of "deviation from the norm" (and this applies not only to northern Italian).

Canepari (1999) includes them, among pronunciation defects, in a detailed articulatory classification (sometimes making use of finer non-IPA phonetic notations):

"[C]'è una certa varietà d'«erre mosce» usate in italiano per caratteristiche individuali. Ci sono quattro tipi uvulari sonori, rispettivamente: vibrante [...], costrittivo [...], approssimante [...] e vibrato [...]. [{] è il tipo normale in lingue come il francese belga, [] in tedesco, [] in francese; [] è un suono piú debole, che può ricorrere come variante occasionale. [...] Altrove, comunque, possono essere piú o meno diffuse in tutte le regioni [...]. Un altro tipo piuttosto frequente d'« erre moscia » è l'approssimante sonoro labiodentale [ʋ] [...] che, nella variante uvularizzata [], suona rivoltantemente snobistico in italiano" (Canepari 1999:98)13.

In fact, rather than being prestige variants, different types of *r mosce* appear everywhere, even in rural areas and in lower socio-economic conditions, and are often considered to be a pronunciation defect. Barry (1997) remarks that the apical *r*-pronunciation is simply something that a number of speakers in any country just cannot produce:

"In Italy and Spain, and Bulgaria, where trilled and/or flapped lingual «Rs» are de rigueur, efforts are made at primary school level to help children with problems. A good proportion do indeed achieve the goal, but there are always «pathological» cases which have to resort to e.g. a «labial R»" (Barry 1997:38).

<sup>13</sup> Referring to *r moscia*, the author gives very useful phonetic details when he observes that these sounds "in italiano di solito si accompagnano anche a una struttura sillabica caudata piú «strascicata» /ˈVC/ [ˈVˑC] (invece di [ˈVCː])". Furthermore, a better account of the conditions in which these pronunciations appear is in the following passage: "Non raramente alcuni tipi d'« erre moscia » sono usati volontariamente come degli *xenofonemi* stilistici, parlando in italiano, anche se spesso i risultati sono ridicoli e insopportabili. Di solito, l'erre moscia dà un'impressione d'affettazione" (Canepari 1999:99-100).

Some concessions are made by Mioni (1986) who gives a reduced list of possible *r*-variants and writes:

"Tutti questi foni sono possibili sostituti di /r/ in patologia anche se l'uvulare [ʁ] è così ampiamente diffusa tra gli italiani, che ci si può domandare se debba ancora essere considerata come deviante" (Mioni 1986:46, n. 27).

A more tolerant opinion is expressed by Canepari (1999):

"[I]n alcune zone d'Italia la realizzazione piú diffusa per /r/ è uvulare [ ʁ ʀ], che localmente può essere considerata quasi il tipo « normale », mentre l'articolazione alveolare diviene minoritaria; si tratta dell'Alto Adige, della Val d'Aosta e di buona parte della provincia di Parma" (Canepari 1999:101)14.

However, if I were to give an estimate of the quantity of *r moscia* pronunciations in (mainly urban) northern Italy, I would probably say that surely less than 10% of speakers systematically resort to this kind of (various) pronunciation (perhaps more than 10% only in Piedmont and in the Parma province)15.

As for the Italian back *r*-sounds, the origin of the irregular presence of these pronunciation styles is rarely investigated (Migliorini 1992:485 reports a source of the 17th c. referring to a French-style imitation).

High society French models have traditionally been described as the origin of the diffusion of back *r*-sounds in various central and northern European languages (see, among others, Chambers & Trudgill 1998), but several authors quoted in Van de Velde & van Hout (2001), Van de Velde et al. (2013) and Sankoff & Blondeau (2013), claimed an older and independent origin for different areas (e.g. Holland and the Rhineland). The theory of the French back-*r* spread could be valid for some Italian areas but other hypotheses cannot be excluded16.

<sup>14</sup> For a 'normal' diffusion of uvular *r*-sounds in the area of Parma see Canepari (1999:387; also see a few comments at p. 381, about a possible diffusion in northern Lombard provinces, cp. Rohlfs 1966:377). A socio-phonetic survey of *r*-sounds in the Parma province is now presented in the first section of Felloni (2011).

<sup>15</sup> On the contrary, I would probably establish a definite upper threshold for French back-*r* pronunciation standing everywhere over 90%. This should give an idea of the difference between the two situations.

<sup>16</sup> Fundamental contributions have been given by Bonnard (1982) who collected elements to show that the back *r* is a creation of a high socio-economic class and dates back to a period between the 15th and 17th c. The change took place as a consequence of the raising of the tongue dorsum towards the velum (with or without flapping of the uvula). This kind of explanation is adopted in Delattre (1966:207). The French *r*  shift is interpreted by this author as the consequence of a language-dependent articulatory constraint. Carton (1974:164) seems to go in the same direction accounting for an effect of "vocalic anticipation" but concludes in favour of a social explanation. Nevertheless, the same stands for Italian (or Spanish) where the trill is even considered articulatorily complex (Francescato 1970:75-76), and is often replaced by /l/ or uvular sounds by some children at the first stages, but nothing stops the acquisition of the apical trill which progressively asserts among various allophones.

In the evolution of the Italian language and of Romance dialects spoken in Italy, a significant number of different phenomena, related to sound changes and derivational processes, involved rhotics. Besides the alternations inherited from Latin, and general properties related to liquids in Romance dialects, various outcomes are usually described (see Romano 2008 for details).

In present-day Italian, according to Canepari (1999:101-102), one should take into account at least the following *r*-variants as typical realisations in some regions, even though some speakers may have recourse to other choices.

A single-strike articulation is widespread in northern areas in almost all the contexts (even as a /rr/ realisation in conservative accents) but, in association with velar, uvular or pharyngeal realisations described above, Piedmont, Aosta Valley and part of Emilia-Romagna and Lombardy have an apical trill usually uvularised [] [...] whereas in Liguria an alveolar uvularised tap [] seems more frequent (see §2).

Among the most interesting regional *r*-sounds there are north-eastern alveolar approximants and taps which are generally lateralised (and therefore they really sound as liquid-*r*'s). In Venice the most common *r*-realisation is a postalveolar (somewhat retroflex) flap tending to show lateralisation (see above; cp. with retroflex flaps studied in Kvale & Foldvik 1995). These sounds realise /r/ in almost all the positions, often violating the general scheme illustrated in §317.

Slightly different varieties of these sounds can be heard in coastal areas of Tuscany (on the Tyrrhenian coast; see Romano, forthcoming).

In particular, I would like to emphasise that these *r*-variants are rarely perceived as marked and are usually attributed to a regional 'accent'. These sounds could be described as a kind of more retracted retroflex approximant (something like a [ɻ=]) and occur as a realisation of /r/ in internal coda position or as the implosive phase of /rr/. They are particularly evident in stressed syllables in casual speech18. In Sicilian and southern Calabrian, word-initial *r*'s traditionally undergo a

<sup>17</sup> I shall transcribe these sounds with [R], [] and [}] respectively. Canepari's definitions are often more finegrained and need additional special symbols (Canepari 1999:101, 401). As far as I have been able to observe, the voiced alveolar approximant (not lateralised) described by Canepari (1999:102), as is common in Apulia, is attested with some limitations around Bari and in speakers of Albanian origins (on the contrary, the voiced alveolar fricative tap introduced accounting for the Italian *r* pronunciation in northern Calabrian may have a wider extension in southern Italy). Other places where liquid-*r*'s are *de rigueur*, as already introduced, are south-western Piedmont (with the [R], usual around Frabosa, and [ɻ], between Pamparato and a wider area in the Asti province, which determine varieties of those *r-sounds* known as *r monferrina*; see Cabiale 1970, and Ghia 2010). Similar sounds are typical for some conservative *patois* speaker from Salbertrand (in the Turin province) and other Alpine areas on the border (Briga Alta). Western varieties in the same valleys are renowned for using a different *r*-sound known as dental *r* (or, more locally, *valsusina r*) whose realisations oscillate between [D4] and [z4].

<sup>18</sup> In his description of the dialect of Rossano (province of Massa), Rossi (1974:413) defines a postalveolar [r], but [i]-like vocalic component are highlighted in some *r*-transcriptions given by Rohlfs (1966) for Pisan and Ligurian varieties (see Giannelli, 1983; Pacini, 2004). A critical overview on palatalised rhotics is offered by Hamann (2002).

lengthening process - initial long trills are frequently realised as cacuminal (or retroflex) fricatives. Most of these pronunciations are also common in the speech of conservative speakers when they speak their regional Italian19. Moreover, in the same regions, -*tr*- and *dr*- are subject to affrication, yielding to postalveolar stops or affricates (e.g. Sic. *t*̩*r*̩*enu* vs. It. *treno*) 20.

Apical trills devoicing is also widespread in non-standard central and southern Italian pronunciations and is usually disregarded in the specific literature (examples are collected by Canepari 1999:440, 445, 447)21.

#### 5. Other (pretended) back r-sounds

In spite of the common idea that *r moscia* is a uvular *r*, the most common defective *r*-sounds are labiodental approximants [ʋ] (often velarised [])22.

Similarly, pretended French *r*'s in Italian speakers are nowadays uncommon in French.

Northern Italian speakers using a back *r* do not all have recourse to the same kind of articulation, but use significantly different varieties. Here is a simplified list of the most common possibilities (also possible everywhere in Italy):

<sup>19</sup> According to Canepari (1999:102), in these regions (plus Sardinia), word-initial /# r/ is replaced by /# rr/. In Sicily and southern Calabria, this is then realised, in the more conservative accents, as a voiced alveolar or postalveolar fricative/approximant sometimes transcribed as [ʐ] which is obviously neither [z] nor [ʒ] (nor their weakened counterparts). Missing fundamental information on tongue sulcalisation, I usually simplify the transcription of these sounds, assuming postalveolar (retroflex) fricatives and approximants as basic sounds (for a review on retroflexion see Bhat 1974). In unpublished research carried out in 2007 I made several measurements on realisations of this type collected by Vito Matranga within the archive of audio-recordings avalaible in the ALS. These approximants, fricatives and affricates show different degrees of fronting or cacuminalisation (see Matranga 2007).

<sup>20</sup> Note that the *tr-* cluster after *s*- undergoes anticipatory assimilation too (-*str*- > -ʂʂ- > -ʃʃ-). The general phenomenon (also attested for Sallentinian varieties, see Romano 1999) is well-described in Italian phonetic literature (since Millardet 1933) and a number of articulatory possibilities are specified for Calabrian dialects by Romito & Belluscio (1996), Sorianello & Mancuso (1998) and others (see Romano & Gambino 2010).

<sup>21</sup> The devoicing process is mainly attested in coda position before voiceless consonants where speakers of these varieties hyperarticulate *r*-sounds with an increase in the tension of the constriction (and slight retraction of the articulation place) by producing [r8] and/or [r̝�].

<sup>22</sup> E.g. some Piedmontese speakers presenting the labiodental approximants [V], when not suppressing the sound, tend to articulate the clusters /pr-/ and /br-/, in particularly prominent positions, respectively as [ʙ8 ( ᴿ ) ] and [ʙ( ᴿ ) ] (maybe only single-strike). That seldom happens even for Piedmontese speakers with uvular trills (similar sounds mark the pronunciation adopted for the Italian voice of the Warner Bros' cartoon character Roger Rabbit who utters [ʙ8]/[ʙ] in the realisation of initial *pr-/br-* clusters). Another example is the stereotype given by the actor Totò for the Neapolitan snobbish *r moscia* which is realised as a dental approximant (something like [D4] or [z4], see footnotes above). Finally, I shall mention here the example of a professional speaker of the regional Piedmontese TV News of the National Broadcaster RAI, who frequently lets the tip of his tongue come out from the mouth while speaking (occasionally showing linguo-labial contacts). This phenomenon systematically appears during the production of the clusters *-rt-*, *-rd-*, *-rl-* and *-rn-*, all normally including apico-alveolar contacts, replaced by predorso-alveolar contacts. They are probably induced by a preceding interdental approximant gesture (something like [D4]), which is the common *r*-sound for this speaker.

(1) speakers using a velar fricative [ɣ] also present the unvoiced variant [x] and the approximant variant [ɰ] in the appropriate contexts (mainly the unvoiced in voiceless consonant context and the approximant between vowels);

(2) speakers preferring a uvular articulation may present trilled variants [ʀ] with one or more strikes (weakened forms of these sounds are fricative/approximant variants [ʁ] or [ʁ̞]) and unvoiced allophones in voiceless contexts ([ʀ̯] / [χ]; following the same distributional rule that could be observed in French)23;

(3) speakers occasionally resort to less controlled post-uvular articulations (the same speakers of the other points above may be subject to these alternations) which could give rise to [ʕ], [ʕ̞], [ɦ] and many variants, often appearing as simple [ɐ]-like sounds in positions where a weakening is likely to take place (generally in coda) or where a reduction gives rise to vocalic glides (between vowels);

(4) speakers presenting labialisation and/or multiple articulation places use many other variants for velar and uvular *r*-sounds (see above);

(5) people affected by *r moscia* (that is a more or less velarised/uvularised labiodental approximants) tend to occasionally allow the back articulation to prevail or to realise simple wavings between vowels, sometimes even yielding to no gesture traces at all.

#### 6. Conclusions

In the present study, general topics have been discussed in reference to historical and present-day representations of *r*-sounds in the Italian linguistic domain which are affected by quite different sociophonetic dynamics.

In the first part of the paper, I have illustrated the normal basic realisations of /r/ ([ɾ], [r] and [rː] for Italian), its distribution and phonetic reduction rules. In Italian, singleton vs. geminate contrasts are generalised in the phonological system: /r/ and /rr/ are associated with different phonetic realisations often reinterpreted in different regional varieties on the grounds of the underlying dialectal systems. Nonetheless, the main source of *r*-variability is in social preferences and in first-language acquisition difficulties.

In the second part of the paper, I have discussed the wide range of possible slightly different realisations of apical rhotics and of their back variants, by highlighting the need for a better articulatory account (testing the presence vs. absence of palatalisation, lip-rounding and secondary articulations, as well as

<sup>23</sup> A number of other possibilities arise for speakers not respecting this 'natural' distribution, then generalising for instance [x] in all the positions or extending the allophones to both /r/ and /rr/ (by neutralising the contrast). I would like to draw attention to the case of a southern area (northern Apulia) where, among a number of speakers using [] and [x] or [] and [X] as common variants of pinched *r*, one may hear some people only using the voiceless variants in phonetic contexts where they are not usual, thus being distinguished from the rest of the community (see Romano, forthcoming).

of concomitant gestures and conditioning effects on the surrounding sounds). Several varieties of unusual *r*-sounds have been surveyed, ranging from limp or lifeless *r*'s to pinched *r*'s and liquid *r*'s.

With regard to the socio- and geo-linguistic situation, several characteristics have been identified. These may help to determine different kinds of *r moscia* on the grounds of the phonetic distinction proposed in the recent rhotics' literature on rhotics between trilling-variants as opposed to waving-variants.

#### Acknowledgements

Part of the work was carried out during my stay in Grenoble and benefittet from the collaboration with Cyril Trimaille and Patricia Lambert of the *LIDILEM*. I acknowledge the people of the two laboratories where I worked during those years: the former *Institut de la Communication Parlée* (*ICP*) and *Centre de Dialectologie de Grenoble* (in particular Pierre Badin and Michel Contini). I am also indebted to the staff of the Linguistic Atlases *ALEPO*, *ALI* and *ATPM* (in particular Sabina Canobbio and Matteo Rivoira) for giving me access to (or helped me to collect) audio materials on Piedmontese *r*'s. I am grateful to Manuel Barbera, Paolo Mairano and Marco Tomatis of the former Faculty of Foreign Languages of Turin and to the Synthesis team of *Loquendo Technologies Ltd.* for allowing me to access their databases of different languages and dialects. Last but not the least, I would like to acknowledge Hans Van de Velde, Didier Demolin, Alessandro Vietti and Lorenzo Spreafico for having encouraged me to keep following the thread of this research. I am particularly grateful to Alessandro and Lorenzo for allowing me to publish part of my previous unpublished work in this volume.

#### References


(Délégation Générale à la Langue Française et aux Langues de France), Ministère de la Culture et de la Communication, manuscript.

Bonnard, Henri. 1982. *Synopsis de phonétique historique*. Paris: Sedes.


un'analisi preliminare. In Pier Marco Bertinetto & Lorenzo Cioni (eds.), *Unità fonetiche e fonologiche: produzione e percezione*, 142-154. Roma: Esagrafica.

Stevens, Kenneth. 1989. On the quantal nature of speech. *Journal of Phonetics* 17. 3-45.


## The spreading of uvular [ʀ] in Flanders

Hans Van de Velde1, Evie Tops2 & Roeland van Hout3 1Universiteit Utrecht 2Université Libre de Bruxelles 3Radboud Universiteit Nijmegen

#### Abstract

In this paper the socio-geographical distribution of alveolar and uvular /r/ in Flanders is researched to provide support for the idea that uvular [ʀ] has become more wide-spread in Flanders in the course of the 20th century. Due to its contact history with French and its relationship with German dialects, the Flemish situation might provide more insight in the controversy around the spread of uvular [ʀ] in Western-Europe. Three data sources are used for this study: two existing traditional dialect survey and a new sociogeographic survey based on a sociolinguistic approach.

#### 1. Introduction

Although /r/ is marked by large-scale variation in the languages of the world, (cf. Van de Velde & van Hout 2001), the alveolar trill [r] is the prototypical r-sound looking at the statistics provided by Maddieson (1984:83). Uvular [ʀ] is infrequent and its occurrence seems to require an extra or special explanation.

The rise of uvular [ʀ] in Western Europe has been debated among linguists since the end of the 19th century. Trautmann (1880) attributed the origin of uvular [ʀ] to the Parisian elite in the 2nd half of the 17th century and Chambers & Trudgill (1980:186ff ) explain the presence of uvular [ʀ] in Danish, Dutch, German, Norwegian and Swedish by the prestige and influence of French. However, it took uvular [ʀ] centuries to become the common variant in France (Martinet 1985:38-39; Carton 1995:36; Tops 2009:238-246). And, the French connection has been contested as an explanation by a number of authors. Moulton (1952) and Penzl (1961) showed that varieties of German had uvular [ʀ] before (Parisian) French, and Wiese (2001) argued that the developments in German were independent from those in French.

Many studies report the occurrence of uvular [ʀ] in Flanders, the Dutch speaking part of Belgium, especially dialectological studies (see Section 2). In some areas the presence of uvular variants is linked to French, in other areas it is considered to be a product of the dialect continuum with German. Due to its contact history with French and its relationship with German dialects, Flanders seems to be an ideal testing ground to get more insight in the controversy around the spread of uvular [ʀ] in Western-Europe. To provide support for the idea that there is a general rise of uvular [ʀ] in Flanders and to get more insight in the mechanisms underlying this change, the socio-geographical distribution of alveolar and uvular /r/ needs to be investigated more systematically.

The data we present come from three rich data sources, two more traditional dialect surveys (rnd, gtrp ) and one socio-geographic survey (ras) based on a sociolinguistic approach. We will discuss the data and results in Sections 3 to 5. In Section 6 we will argue that we can put the results of the three data sources on a time scale, in order to make the rise and spread of uvular [ʀ] visible. It is not only observed at the borders of the language area, but there are also patterns of internal diffusion, as we will argue. We also need to discuss why uvular [ʀ] has the prestige it appears to have acquired, a question we will address in Section 7.

#### 2. Flanders and uvular [ʀ]

In this contribution the Flemish provinces will be used to interpret the regional distribution of (r). From left to right (west to east) in Map 1 we find:


France and Wallonia are French speaking. Brussels is officially bilingual French-Dutch, French being the dominant language ( Janssens 2008), but the local dialect is Brabant Dutch (De Vriendt & Willemyns 1987). Belgian French has mainly uvular variants (Demolin 2001). Alveolar variants occasionally show up in different regions but they are considered archaisms (Hambye 2005:208). Unfortunately, the quality of /r/ is not transcribed in the linguistic atlas of Wallonia (Remacle 1953:59).

Map 1 – Map of the Dutch language area, situated in the Netherlands and Belgium. From: Vandeputte, Omer. 1983. *Dutch: the language of twenty million Dutch and Flemish people.* Rekkem: Stichting Ons Erfdeel. © Ons Erfdeel vzw. Reprinted with permission.

In dialectological studies uvular [ʀ] is systematically reported for the eastern part of Limburg. Grootaers (1951:40) states that all Flemish dialects have an alveolar realization except for the Limburg dialects. The same conclusion can be found in Weijnen (1991), who adds that the uvular [ʀ] is characteristic of the northeastern part of Limburg. As for the provinces of Flemish Brabant and Antwerp, Brussels is marked as a uvular area (Mazereel 1931; Baetens-Beardsmore 1971; Weijnen 1991; De Vriendt 2004). Uvular [ʀ] is reported for Aarschot (Pauwels 1958) and Turnhout (Aerts 1955). More recently, De Schutter (1999:304) observes its occurrence in the city of Antwerp and claims that it is becoming the norm in the Antwerp urban dialect, which is the most prestigious dialect in Flanders (ib:303). However, in De Schutter & Nuyts (2005), uvular [ʀ] is not mentioned as a characteristic of the urban dialect. In East-Flanders, uvular [ʀ] is known as a stereotype of the Ghent urban dialect. De Gruyter (1907) observed the new variant in the beginning of the 20th century. Rogier (1994) documented its spread in the surrounding suburbs and villages. In West-Flanders no special observations about the pronunciation of /r/ are made.

At the same time, in the 20th century, uvular [ʀ] has often been considered a speech deficit (De Schutter 1999:304; Verstraeten & Van de Velde 2001:46; Tops 2009:198). Consequently, a lot of children were sent for treatment to a speech therapist, although Blancquaert (1934:114) had argued in favor of tolerance towards uvular [ʀ]. At the Flemish broadcasting corporation speakers with uvular [ʀ] did not pass the microphone test (Van de Velde 1996:126) and especially schools for drama and eloquence banned uvular [ʀ]. The Belgian film maker and author De Kuyper (1993:37-39) describes how he was banned from a Flemish music academy due to his French /r/ (i.e., uvular [ʀ]). Nowadays, policy has changed and uvular [ʀ] (except for variants with strong friction, as occurring in the Ghent dialect) is accepted at the broadcasting corporation (Tops 2009:198). Since the 1960's text books for speech therapists have shown more tolerance toward uvular [ʀ], but even today alveolar [r] is still preferred "for technical reasons" (Timmermans 2004:33-34). Interestingly, Van Bezooijen (2003:83) suggests that some speakers are simply not able to produce an alveolar trill and use a uvular trill instead and she sees this genetic characteristic as one of the mechanisms involved in the spread of uvular [ʀ].

#### 3. Rnd

The dialectologist Blancquaert started collecting data in 1922 for the first part of the *Reeks Nederlandse Dialectatlassen* (rnd), inspired by Gilliéron's work for the Atlas Linguistique de la France (Hagen 1995:81). Later, rnd developed into a series of dialect atlases covering the complete Dutch language area, with 1956 localities and 4012 informants. The volumes covering Flanders were published between 1925 and 1962 and the data was collected between 1922 and 1953 (Reker 1997:51). A standard questionnaire, mainly consisting of sentences to be translated in the local dialect, was used for data collection by experienced field workers who transcribed – on the spot – the sentences in (narrow) IPA (bear in mind that portable recording equipment was only introduced in the 1950's). The Flemish data used in this study were mainly collected by Blancquaert and collaborators trained by him. For our analysis, we will focus on the 859 localities that are currently situated in the Flemish Community and Brussels Capital Region (the linguistic border was fixed in Belgium in 1962).

Blancquaert first selected localities with at least 2000 inhabitants, smaller places were added in transition zones and if distances between places were larger than 5 km (Hagen 1995:81). Blancquaert did not opt for the traditional NORMs (non-mobile, older rural males; the common type of informant in dialect geography; cf. Chambers & Trudgill 1980:33). Instead, he had a preference for informants between 20 and 40 years (Blancquaert 1948:24), who grew up in the locality and had local parents. About half of the informants belonged to the middle class and almost one quarter were women. In most localities two to three dialect speakers served as informants. It is obvious that RND aimed at collecting the – at the moment of data collection – contemporary use of the dialect. For a more detailed discussion of the characteristics of the RND informants in comparison with other dialect atlases we refer to Johnston (1985). Th e transcribers distinguished two variants of /r/: *r met tongpunttrillingen* (r with trills of the tongue tip, i.e. [r]) and *gebrouwde r* ('burred r', i.e. a uvular realization). Our analysis is based on lexical items from three sentences of the questionnaire: 36 (*peer* 'pear'), 85 (*rijkdom* 'wealth') and 86 (*dorst* 'thirst') of the questionnaire. We selected these items as they were not marked by variation in lexical form, which means that they kept the /r/ most of the times. Lexical variants and realizations in which (r) was deleted were coded as missing values. For each dialect an index score (percentage) was calculated between 0 (alveolar) and 100 (uvular).

In total, 859 Flemish dialects were incorporated. Map 2 gives an overview of the distribution of (r) in Flanders. 803 of them (93.5%) having alveolar [r], 56 (6.5%) having uvular [ʀ]. None of the places had variation in the transcription of place of articulation of (r), and the transcribers had only made remarks for two dialects (0.2%) on local variation in the pronunciation of /r/.

Map 2 – Geographical distribution of (r) in RND data in Flanders (859 localities, collected between 1922-1953). White dots: alveolar [r]; black squares uvular [ʀ].

Th e overall impression of Map 2 is that the uvular [ʀ] is more characteristic of the periphery of the Flanders area than of its core parts. No larger urban centers are involved, except for bilingual Brussels. Map 2 shows that uvular [ʀ] is present up in the following areas (for a more detailed listing of the places, see Tops 2009:204-5):


#### 4. Gtrp

The Goeman-Taeldeman project aimed at collecting a phonological and morphological corpus of the Dutch dialects (Goeman & Taeldeman 1996). The coded transcriptions are available as the Goeman-Taeldeman-Van Reenen database (gtrp ). In total, 613 dialects were recorded in the Dutch language area. 189 localities were selected in Flanders, which is much less dense than rnd. Goeman (1999:58-70) presents an analysis of the social characteristics of the Dutch gtrp informants. Unfortunately, there is no detailed information published about the Flemish part of gtrp .

For each locality in Flanders there was one informant. All the Flemish data were collected between 1990 and 1993. Participants were not exclusively NORMs (Goeman & Taeldeman 1996:52-53). Urban dialects were also included in the sample, and whether a man or a woman was selected depended mainly on practical issues as availability and willingness to participate. Non-educated informants were only selected if they had enough metalinguistic awareness and insight in the aim of the questionnaire. Almost all participants were between 50 and 75 years old at the moment of recording. The main criterion was that participants were indigenous (i.e., grew up locally), preferably of indigenous parents and speaking the dialect on a regular basis (Goeman & Taeldeman 1996:53).

Almost all Flemish data were collected and transcribed by two field workers. The questionnaire contained 1867 items and was sent to the participants about a week in advance. The recordings took about half a day for each informant. For our analysis we selected the 161 singular nouns containing /r/. In the Flemish data the transcribers only made a distinction between alveolar, uvular and deleted variants of /r/1 . Th e deleted variants variants were not taken into consideration for the calculation of the index scores on the front-back dimension. Map 3 gives an overview of the distribution of /r/ in Flanders on the basis of the GTRP data.

Map 3 – Geographical distribution of (r) in GTRP data in Flanders (189 localities, collected between 1922-1953). White dots: homogeneous alveolar localities; black squares: homogeneous uvular localities; grey triangles: non-homogenous place of articulation.

Also in these dialect data places with alveolar [r] are dominant (167/189, 84.3%). Th irteen places only show uvular [ʀ] (6.9%), and nine places/informants (4.8%) mix [r] and [ʀ], but it should be noted that they almost exclusively use uvulars. Again, uvular [ʀ], like in RND, occurs in the peripheral area, with the exception of one important urban centre in the heart of East Flanders: Ghent. Th e occurrence of the uvular [ʀ] can be summarized as follows (for a listing of the places, see Tops 2009:206):


<sup>1</sup> In the Netherlands the phonetic transcriptions are much more detailed and includes variation in manner of articulation and voicing.

#### 5. The Rapid Anonymous Survey (ras)

Tops (2009) conducted a large-scale sociolinguistic survey in Flanders collecting data on the pronunciation of /r/ for 1,912 speakers distributed over 89 localities in Flanders. The aim of the study was to get insight in the socio-geographical distribution of /r/. The technique was inspired by Labov's famous department store study of /r/ in New York City (Labov 1966). The rapid anonymous survey technique is well known thanks to the incorporation of Labov's study in most introductory textbooks in linguistics and sociolinguistics. The technique is also widely popular among undergraduate students taking their first steps in the study of language variation and change. Surprisingly, this research method is hardly used in international publications, Horvath & Horvath's work on /l/ vocalization in New Zealand and Australian English being a rare and successful exception and adaptation (Horvath & Horvath 2001). An important adaptation to Labov's work is that a short word list (shorter than Horvath & Horvath's) was used and that the speech was recorded.

The selection of the localities was done in two steps. In the first step, 39 localities were selected equally distributed over the whole of Flanders, including the main Flemish urban centers Antwerp, Bruges, Ghent and Genk, towns with a regional function (as defined in the official spatial planning documents; cf. www.ruimtelijkeordening.be) and villages. In the second step of data collection, these localities were supplemented by 50 localities selected in areas where alveolar and uvular variants of (r) co-occur. These regions were the surroundings of Ghent, the east of Limburg (the boundary of the old uvular [ʀ] area) and an area north of Antwerp (Hoogstraten and surroundings).

People walking on the street or shopping were approached in the 89 localities by a field worker speaking standard Dutch with the request to participate in a study on voice quality of Brussels University (Vrije Universiteit Brussel) that would take less than 2 minutes of their time. This guise was used as a justification for the recording of the participant's speech and kept the purpose of the research hidden. If somebody agreed to participate, the interviewer asked their age and whether they were local (only local participants were selected for the analysis). They had to read 20 words, listed on four cards, offered in a random order. The aim was to fill a quota sample of four groups of five participants: two age groups (old vs. young) by gender. The age ranges were 16 to 35 (young) and older than 35 (old). When there were problems to fill the quota the locality in question was revisited. With only a few exceptions (even after returning twice to the same place), the quota could be filled fairly well. The total number of participants was 1,912.

The speech was recorded on a portable TASCAM DA-P1 recorder, with a Sehnheiser MD425 dynamic supercardioid hand-held microphone. The recordings did not only have the sound quality required for reliable auditory analysis, most of them were also good enough for acoustic analyses (cf. Tops 2009:21-120).

The word list to be read aloud contained 20 monosyllabic words, distributed on ve cards. Eight words were (r)-less distractors. All the cards ended with a distractor, to avoid end of lists eects aecting the realization of (r). Twelve words contained our variable (r): three in onset position (*reep, rood, reus*), three in coda position (*gaar, zuur, voer*), three in an onset cluster starting with [t] (*troon, trein, trui*) and three in a coda cluster ending with [t] (*buurt, kaart, woord*).

The number of usable (r) realizations collected was 22,720. Twelve variants – along the dimensions of trilling, friction, place of articulation, and voicing – were distinguished on the basis of a combination of auditory and spectral analyses by the second author. However, the methodology was developed in collaboration with a number of specialists in the fields of phonetics, dialectology and language variation and change. In cases of doubt, these specialists were also consulted for the coding of the variants. It resulted in six alveolar, four uvular variants, schwa and a null realization. Per locality the number of variants ranged between four and twelve. 74 localities (83.1%) had eight to eleven variants. This shows the enormous variability of the pronunciation of /r/ within localities. Some variants were particularly related to the position of (r) in the word, but we will not further investigate the role of the linguistic context in this contribution. Except for the front-back distinction of alveolar vs. uvular, there were no socio-geographic patterns in the distribution of the variants. Therefore, and for the sake of comparison with rnd and gtrp data we will focus on the alveolar-uvular dichotomy in the remainder of this paper. We found 15,623 alveolar realizations (68.8%), 7,044 uvular realizations (31.0%), 11 schwas (0.0%), and 55 deletions (0.2%). We computed for each speaker a frontback index or percentage, excluding the schwa and the null realization. A score of 0 means only alveolar variants, a score of 100 means only uvular realizations. The next step was to compute the average percentages per locality. The distribution of the scores for all localities can be found in Figure 1.

How is the distinction of alveolar and uvular variants distributed over speakers and localities? There is a lot of variation between speakers, ranging from completely alveolar to completely uvular, but there is a remarkable absence of variation within speakers. Individual speakers turn out not to mix uvular and alveolar variants in our recordings. The number of speakers who mix both variant types is 102 (5.3%), of whom only 38 (2.0%) have a mix of 20% or more of one variant and 80% or less of the other one.

Figure 1 – The frequency distribution of the average front/back scores for the 89 localities.

Figure 1 shows that most localities have more alveolar realizations, as is also clear from the mean percentage of 31.1%. Most localities have a mixture of front and back variants. Only seven places have exclusively alveolar [r], and only two uvular realizations, which implies that uvular [ʀ] is present in almost all localities. When localities have a score somewhere between 0 and 100, these localities have a mix of uvular and alveolar speakers, as the number of mixed speakers is low (5.3%). In most mixed places alveolar realizations are dominant (60 scores below 50 and above 0) and only eight places have a score of 80% or more on the front-back dimension. Th e standard deviation of 45.5 refl ects a high degree of variability between the localities.

When a change is ongoing from alveolar to uvular in Flanders, the localities will show an increase in the uvular index when younger and older speakers are compared. Figure 2 presents the comparison between the two age groups per locality. Th e indices of the younger group of participants are represented on the vertical axis and the indices of older group of participants on the horizontal axis. Th e diagonal indicates the position of the indices if the two groups would have the same scores. A stable age distribution – indicating absence of change in progress – would produce scores oscillating randomly near the diagonal.

Figure 2 – Scattergram of the mean front-back index (percentages of uvular [ʀ]) of the younger group versus the older group for 88 localities (the young age group was lacking in one of the localities).

Th e pattern in the scattergram of Figure 2 is remarkably clear. All dots – with only a few exceptions – are in the upper part. Younger speakers have more uvular realizations than older speakers. Th is sharp shift towards uvular realizations is found in all provinces (Brabant, Antwerp, Limburg, East-Flanders), except West-Flanders where all localities had low scores (about or less than 20%), both for the younger and older age groups, indicating that this region has mainly alveolar realizations. Some localities have a strong shift, from a percentage of (about) 0% for the older age groups to more than 80% for the younger age group. Th e few scores in the area under the diagonal lower part might be the result of sampling fl uctuation.

Th e geographical age shift can be illustrated with two maps: Map 4 gives the geographical distribution for the old speakers, Map 5 for the young ones. Th e darker the symbol, the higher the percentage of uvular [ʀ] use in a locality.

Map 4 – Geographical distribution of the front-back index for the older age groups in ras, ranging in gray between white symbols (0%, only alveolar realizations) and black symbols (100%, only uvular realizations).

Map 4 shows the use of uvular [ʀ] by older speakers. Uvulars are frequently used in:


Additionally, uvular [ʀ] also shows up in:


Map 5 – Geographical distribution of the front-back index for the younger age groups in ras, ranging in gray between white symbols (0%, only alveolar realizations) and black symbols (100%, only uvular realizations).

Map 5 for the younger age groups shows a strong geographical expansion of the uvular realizations in comparison with the older age groups in Map 4.


#### 6. Mapping the rise of uvular (ʀ) in Flanders: Integrating the three data sources

In the preceding sections, three different data sources on the realization of /r/ were discussed. At first sight, the geographical patterns seem to fit nevertheless. No obvious contradictions show up in the geographical patterns observed, as the same areas consistently come out as centers of gravity and perhaps expansion of uvular [ʀ]. Can we push the analysis one step further by reconstructing a time scale on the basis of our data sets that gives a more detailed and precise impression of the rise of uvular [ʀ] in Flanders? That raises of course the question of the comparability of our data sources. Therefore we look at the similarities and differences between rnd, gtrp and ras.

In Table 1 we have defined six pivotal characteristics that are used to evaluate our three data sources. The first characteristic is defined as dialect as target. rnd and gtrp obviously have the local dialect as the target variety: the participants were asked to translate sentences and words in their local dialect and were selected for this purpose. In ras the target variety was not made explicit to the participants and they did not know the real purpose of the study, as the guise of a voice quality study was used. However, the whole context of the study aimed at eliciting non-dialect / standard speech: the informants were addressed in standard Dutch by a researcher who identified as being from the university and were asked to read aloud a word list with (standard) Dutch words. Reading out loud single words is a regular activity in primary school. The visible use of recording equipment increased speech monitoring and the use of standard speech. None of the informants translated the words spontaneously in their local dialect, as was clear from the quality of their vowels. Therefore ras is characterized as minus for dialect as target.


Table 1 – Characteristics of the three data sources.

The ras study aimed at charting local variation by sampling individual speakers, not being selected because of their expertise as in gtrp and rnd. gtrp selected only one speaker per locality, who was presumed to be an expert in the local dialect. Although being older, rnd is more sociolinguistic in its approach (Hagen 1995). The field workers commonly worked with two or three informants per locality and asked questions about the local sociolinguistic situation (e.g., do people also speak standard Dutch or French, differences within the dialect, immigrants from other areas). We found a couple of remarks about the pronunciation of /r/ in the Flemish localities, but this did not lead to variation in the transcriptions of /r/ within localities. Therefore, rnd is also marked as minus for local variation. Variation between speakers within localities is at the heart of the ras data collection, aiming at 20 informants per locality. ras used isolated words as triggers for eliciting data, gtrp isolated words and short phrases, rnd sentences.

The ras survey pretended to be a study on voice quality, not of dialect or standard pronunciation. That means that the informants focus less on their own speech and language than in rnd and gtrp , which openly direct the awareness of the informants on the distinction between standard and dialect. Furthermore rnd and gtrp selected expert speakers of the local dialect, while ras selected speakers that were just local, without any evaluation of their speech characteristics.

Despite these differences, there is one striking correspondence in the outcomes of the three surveys. There is hardly any variation on the front-back dimension of /r/ on the individual level. For ras – based on twelve tokens of (r) – we found a percentage of 5.3% mixed speakers. For gtrp – 161 (r) tokens – variation was found in only two localities (1.1%); no variation was found in the other 187 localities, despite the large number of words per locality. However, it must be mentioned that this low number of localities showing variation for place of articulation can be partly a result of the transcription method used in Flanders (Rob Belemans, p.c.). None of the 189 places in rnd showed variation for the three words investigated, and only for two places a remark was made about the pronunciation of /r/.

How can we put the three surveys on a time scale? We can globally estimate the average year of birth of the informants in the three surveys. For rnd the age range of 20 to 40 years of age is reported as the default, which gives an average of 30 years of age. The time range of data collection in Flanders was between 1922 and 1953, the majority of the data being gathered in the 1920's and 1930's. That results in 1930 as a rough estimate of the year of birth of rnd participants (1940 – 30 years). gtrp was recorded between 1990 and 1993, with an average age of 60.5 years. That gives an estimate of 1930 as year of birth. The ras data were collected between 2002-2004, the average age of the participants in the young age group is 24 and of the old group 54. That gives an estimated average year of birth for the younger group of 1980 and for the older group of 1950. These outcomes were included in Table 1.

Map 6 – The spreading of uvular [ʀ] in Flanders between 1930 and 2000. Black: areas rnd (1930); dark grey: areas gtrp (1950); grey: areas ras old (1970); light grey: areas ras young (2000).

Estimating a time slot for each of the data sources we added 20 years to the estimated average year of birth, to indicate that the group involved had reached the adult life stage. This results in the following periods: 1930 (rnd), 1950 (gtrp ), 1970 (ras old) and 2000 (ras young). Map 6 visualizes the geographical expansion of the use of uvular [ʀ] in Flanders. In stead of outcomes on individual localities as in Maps 2 to 5, we have indicated areas where uvular [ʀ] shows up. The uvular [ʀ] areas in the oldest data (rnd 1930) are marked in black. For the other sources/periods we used shades of gray, ranging from dark grey (1950) to light grey (2000).

The following geographical patterns show up:


#### 7. Conclusion and discussion

Our three data sources provided substantial and complementary information on the occurrence and rise of uvular [ʀ] in Flanders. The dialect data sources of rnd and gtrp exemplified that uvular [ʀ] already had acquired a (modest) position in the dialects of Flanders in the first half of the 20th century. This was confirmed by observations in the dialectological literature on specific dialects. The socio-geographical ras data made clear that the use of uvular [ʀ] sharply increased, as witnessed in particular by the data collected on the younger age groups. Whereas uvular [ʀ] was particularly found in the periphery of the language area covered by the corpus (East-Limburg and occasionally near the language border with French) in the rnd and gtrp data, it is present in all provinces of Flanders nowadays, with the exception of West Flanders, where uvular [ʀ] only occasionally shows up. It is interesting to note that in the literature Bruges, the largest urban area in West-Flanders, has been mentioned several times as a place where uvular [ʀ] was observed (Weijnen 1991:190). Bruges was part of the ras database, and the mixed picture of Bruges is confirmed by the data. There were 18 homogeneous alveolar speakers, but two outspoken uvular speakers as well (one old, the other one being young). According to our ras data Bruges seems not to be an expansion center of uvular [ʀ], but its presence and status certainly needs to be investigated in more detail.

The three databases together, as explained in Section 6, reveal the patterns of an ongoing change from alveolar [r] towards uvular [ʀ], triggering automatically the question about the origins of this change. Several origins or sources seem to be present. The most straightforward explanation can be given for uvular [ʀ] in Limburg, as it is part of a larger and older Germanic dialect continuum marked by the uvular [ʀ]. Van Reenen (1994) concludes on the basis of the gtrp data from the Netherlands, that Dutch Limburg is the core area of uvular [ʀ], but he also observed the occurrence of uvular [ʀ] in the Dutch province of North Brabant, with Breda as the center of expansion. Mees & Collins (1982) observed that uvular [ʀ] is common in educated speech in large parts of the Netherlands, including Limburg and North-Brabant.

The Dutch North-Brabant area borders the Antwerp province and might be the trigger for the emergence of uvular [ʀ] in the north of the province of Antwerp. Since the 1980's this area (and also the Belgian East-Limburg area near the Dutch border) has seen a large influx of immigrants from the Netherlands, showing an increasing trend in the last two decades (Sumresearch 2006; WODC 2009). In the beginning, a lot of Dutch immigrants were retired, wealthy people, motivated by economic reasons (cheaper housing, tax regulations). Since the 1990's a broader section of the Dutch population emigrates to Flanders, often for economic reasons. It should be noted that a substantial part of them remain closely attached to the Netherlands, by for instance a job in the Netherlands (WODC 2009:23). The number of Dutch immigrants per 10000 inhabitants ranged between 50 and 80 per 10000 inhabitants in the period 1997-2003 in the area bordering the Netherlands (Sumresearch 2006:16). Specifically for the region North of Antwerp a yearly increase of about 980 inhabitants coming from the Netherlands is observed, while at the same time about 600 inhabitants leave the area for other Belgian places (Sumresearch 2006:17). It is not unlikely that the increasing presence of people from the Netherlands, of whom many come from a uvular [ʀ] area and stay connected to it (e.g., by working in the Netherlands; WODC 2009:23), are a factor in the spread of uvular [ʀ] in the region north of Antwerp. Of course, the prestige of Antwerp dialect, and the increasing use of [ʀ] in Antwerp in recent years (De Schutter 1999), will also add to the increased use of uvular [ʀ] north of Antwerp.

The occurrence of the uvular [ʀ] along the German-Romance language border can be brought about by the impact of varieties of French having uvular [ʀ]. Kruijsen (1995) established stronger tendencies of borrowing from French along the language border in the province of Limburg. This effect was stronger the closer a place was to the language border. He admittedly investigated lexical borrowing, whereas the /r/ pronunciation is in fact a structural borrowing. Kruijsen (1995) found however some more general patterns of borrowing that do not exclude the borrowing of uvular [ʀ]. The mechanism of borrowing can help to explain why places shift in reporting a uvular [ʀ] in comparing the three data sources. The uvular [ʀ] may arise because of bilingual speakers and language contact, but may disappear by counter-effects as well. An intensive language contact situation, active bilingualism, in combination with the dominance of the French language offers a plausible platform for the occurrence of uvular [ʀ] in Brussels and its surroundings. Patterns of diffusion may have played and still may play a role in propagating the uvular [ʀ] in the municipalities of Brussels.

The Ghent uvular [ʀ] is a famous case in the Dutch dialectological literature, as is the earlier mentioned city of The Hague in the Netherlands. The origin and prestige of the uvular [ʀ] is found in French, the language learned and used by the local bourgeoisie who started to use this form in their dialect and standard Dutch. The ras data testify how strong nowadays the uvular [ʀ] is expanding around Ghent (see also Rogier 1994). The uvular [ʀ] apparently acquired prestige in Ghent and its surroundings.

The situation in the Flemish Brabant places is much less transparent. Perhaps Brussels had an impact, perhaps it was the same mechanism, only later, as in Ghent and The Hague, perhaps a combination of both. It may have its origin in an urban hierarchy in which Brussels has prestige. The hierarchy matches with the direction and patterns of diffusion in Flemish-Brabant.

The literature often reports how rapid the spreading of the uvular [ʀ] must have taken place in France and Germany. The ras data confirm in the age patterns found (see Figure 2) that the change can be fairly rapid and may possess more the contours of a sharp shift than being marked by a gradual curve. It means the geographical and a social embedding can develop or construct pathways along which language change can proceed with a strong and intensive impetus. In such rapid and vigorous changes it is likely that children play an important role, just as in the spreading of approximant /r/ in the Netherlands (Van Bezooijen 2005). The /r/ sound being one of the latest to be acquired, it seems to be sensitive to phonetic adaptation in childhood (Van Bezooijen 2005:29) and children might take over uvular [ʀ] from other children, not from their parents. Also the fact that (some of the) uvular variants are easily distinguishable from alveolar variants, might be a factor in the speed of this change. When uvular [ʀ] develops overt positive prestige, it may rise radically in speech communities at the expense of the alveolar variants. It explains at the same the rigor with which uvular [ʀ] was contested, as mentioned often in the literature.

The ease of perception relates to the question of the linguistic embedding of the alveolar and uvular variants. As to the linguistic embedding of the uvular [ʀ] our data give some relevant outcomes. The number of mixed speakers is remarkably low. There is no mention of mixing by individual speakers in the rnd data, but according to the remarks in the atlas one of the three informants in Bree (Limburg) is an alveolar [r] speaker, uvular [ʀ] being the 'dominant' variant and in Strombeek-Bever (near Brussels) one informant – a primary school teacher – speaks with [r], while [ʀ] is 'common'. The gtrp data contain two speakers with mixed realization. The most obvious explanation, given the strong regional concentration and the remarks in the notes of rnd, seems to be the impact of the alveolar standard norm. The ras data contained 102 mixed speakers (5.3%). They occurred in all kinds of localities and we could not trace a further explanatory pattern in their occurrence. We would have expected more mixed speakers in mixed localities, but we did not find such distribution. On the other hand, the variability within places was large, which sometimes was the consequence of the differences between the two age groups investigated.

In the embedding of the change form alveolar to uvular we found no covariation patterns with other variants having a strengthening or mediating effect in the transition from alveolar to uvular variants, for instance by using a trilled uvular [ʀ] first, followed by the occurrence of fricative variants, or by using uvular variant first in onset or coda. No traces of specific tracks or routes came about. This may explain the absence of mixing speakers. This suggests that it is worthwhile to perform a more detailed study on mixed speakers. At any rate, the high variability between speakers and the ease of perception of the alveolar – uvular distinction make the /r/ a perfect candidate to get involved in social patterning. However, it should be noted that not all uvulars are easy to distinguish from alveolars, and that this might also play a role in this change in progress.

The ras data showed that there are hardly any homogeneous communities with respect to /r/. That means that /r/ is inherently marked by variation, that uvular [ʀ] is almost everywhere and that these variations may start to coincide to reach the stage of an incipient change. External forces or factors (borrowing from French, neighboring dialects, migration from the Netherlands) may contribute to strengthen patterns of co-incidence, resulting in a rapid and massive spread of uvular [ʀ] over Flanders. Some of our explanations are very tentative, and to fully understand the rise of uvular [ʀ] more research is needed to understand the role of and interaction between the factors suggested in this paper.

#### References


rnd. Reeks Nederlandse Dialectatlassen. Antwerpen.


Remacle, Louis. 1953. *Atlas linguistique de la Wallonie*. Liège: Vaillant-Carmanne.


## Instability of the [r] ~ [ʀ] alternation in Montreal French: An exploration of stylistic conditioning in a sound change in progress

#### Gillian Sankoff1 & Hélène Blondeau2 1University of Pennsylvania 2University of Florida

#### Abstract

This chapter focusses on the middle phase of a very rapid change, exploring the relation between the phonological conditioning and the stylistic conditioning of the variation across the lifespan with regard to the situation of the speaker in the change spectrum. An analysis of the real-time change from apical [r] to posterior [ʀ] in Montreal French for two speakers across the lifespan illustrates that the sensitivity to stylistic conditioning is a complex phenomenon. Although both speakers acquired the apical variant as children they are not equally sensitive to the stylistic environment. Further research using a combination of trend and panel study needs to be done on other variables involved in the process of change if we want to better understand the relation between stylistic markedness and the process of change.

#### 1. Introduction1

Previous studies of sound change have indicated that change tends to proceed incrementally. The many ongoing sound changes in Philadelphia vowels, for example, show a regular progression across generations in the elegant regressions of Labov (2001). Regular, incremental progression also appears to be the order

<sup>1</sup> We thank the National Science Foundation for funding this research ("Language Change Across the Lifespan", Grant BCS-0132463 to Gillian Sankoff ). This joint research was begun when Hélène Blondeau held a postdoctoral fellowship from the Fonds pour la Formation de Chercheurs et l'Aide à la Recherche, Gouvernement du Québec, at the University of Pennsylvania in 1999 – 2001. We gratefully acknowledge the invaluable contributions of David Sankoff and Henrietta Cedergren in co-designing and implementing (along with the first author) the original Montreal study in 1971; of Pierrette Thibault and Diane Vincent in carrying out the 1984 followup study; and of Diane Vincent, Marty Laforest and Guylaine Martel in undertaking the 1995 followup. We are especially grateful to Pierrette Thibault for her help in making materials of many kinds available for our present research – often at short notice, and thank her and Bill Labov for discussion of theoretical and methodological issues, as well as questions of substance. For their assistance in coding and verification of the data, we thank Anne Charity, Alice Goffman, Daniel Alejandro Gonzales, Sarah Moretti, and Sergio Romero. Michael Friesner and Damien Hall not only helped with final coding and verification, but also made many useful suggestions on an earlier version of the manuscript.

of the day in the massive vowel rotation of the Northern Cities Shift (Labov et al. 1972; Labov 1994), in the retrograde shift of the Parisian vowels (Lennig 1978), in the raising of (o) in Korean (Chae 1995) and many other cases.

With respect to consonants, incremental change seems less obvious. More discrete in nature, consonantal change might be susceptible to more dramatic or rapid change. Here again, available studies point to quantitative alteration such that the innovative form becomes increasingly dominant over time (e.g. Cedergren 1973b, 1988; Labov 1994; Haeri 1994)2 .

This established finding, however, does not imply that sound change *must* operate incrementally. Our research on the replacement of Montreal French apical [r] by posterior [ʀ] in the 1960s – 1990s has indicated a drastically different pattern for the implementation of this change (Sankoff et al. 2001; Blondeau et al. 2003; Sankoff & Blondeau 2007). In this change from above, many individual speakers have passed from a highly variable use of both [r] and [ʀ], to a stage in which they are categorical or near-categorical users of [ʀ], without having used any phonetically intermediate variants.

In the current paper, we examine the linguistic behavior of two speakers across the lifespan in order to illuminate the role of stylistic variation in different phases of the change. This detailed analysis allows us to explore the relation between the phonological and stylistic conditioning with regard to the situation of the speaker in the change spectrum.

After providing a summary of our previous research on the [r] → [ʀ] change in Montreal, and explaining our methodology, the article concentrates on the individual variability, more specifically on the stylistic conditioning of the variation.

## 2. Our previous research on the [r] → [ʀ] change in Montreal

In studying the real-time change from apical [r] to posterior [ʀ] in Montreal French, we have employed both trend and panel comparisons. This was made possible through the use of three corpora, recorded in 1971, 1984 and 1995 (Sankoff & Sankoff 1973; Thibault et al. 1990; Vincent et al. 1995). Our data on Montreal include 120 speakers recorded in 1971, and 60 of the same people

<sup>2</sup> One striking exception to the gradual character of changing relative frequencies in consonantal change is documented in Trudgill's re-study of Norwich, in the merger of /f/ and /th/, and non-initial /v/ and /d/. He found that "not a single speaker in the 1968 sample showed even one instance of this phenomenon, [but] of people born between 1959 and 1973, 41% have the merger variably; and 20% have a total merger, i.e. /θ/ has been totally lost from their consonantal inventories" (1988:43). Many variable consonantal alternations are, of course not involved in change, e.g. the alternation in English of (th) and (dh) with affricates and stops in Philadelphia (Labov 2001, Chapter 3); and Spanish s→h→0 in Panama (Cedergren 1973a).

recorded again in 1984. In addition, 12 younger speakers were added in 1984. Of the original speakers, 12 were recorded again in 1995, along with 2 from the younger 1984 cohort.

Our first paper on (r) (Sankoff et al. 2001) was based entirely on panel comparisons of individuals selected from the three corpora. Making maximal use of the reduced 1995 corpus, we studied the 14 speakers carried through 1995, along with a further 11 for whom comparisons were possible between 1971 and 1984 only. We were surprised to discover that a sizeable minority of speakers had altered their usage significantly over the years, and decided that an expanded group of subjects was necessary in order to understand the course of the change more fully, as it was implemented by individual speakers. In a second study, we examined the trajectories of several individuals, comparing their implementation of the [r] → [ʀ] change with their adoption of an ongoing morphological change from above (Blondeau et al. 2003). In a third study, an enlarged sample was designed to make trend vs. panel comparisons over the 1971-1984 period (Sankoff & Blondeau 2007). This paper clearly shows the change as being implemented chiefly by a younger cohort of speakers joining the pool of [ʀ] users, and that change over the lifespan by individual speakers is part of the general movement, but not the driving force.

#### 3. Methodology

As in our previous research, this paper reports on the two major variants of interest in the ongoing change3 :


For each speech sample, we followed Clermont & Cedergren (1979) in calculating the percentage of [r] as a function of the two consonantal tokens, according to the formula [ʀ] / ([ʀ] + [r]) \* 100. We then carried out χ<sup>2</sup> analysis to verify whether codings were significantly different, taking the .05 level as our baseline. When two codings were more dissimilar than this, we had a third

<sup>3</sup> In addition to these two, we coded for four other variants: cases which were too indistinct to hear were coded as *indistinct*, and removed from further consideration; deleted (r) in final clusters were coded as *deleted*; a fifth variant was the rather rare *retroflex* known locally as the 'American r', and articulated as in English Canadian pronunciation; and a final variants was *vocalized* (r). This variant, most often found in the coda environment, though not restricted to it, is very frequent in the speech of many Montrealers, especially in function words like *sur* and *pour*.

person re-code, then (in most cases) held a group session in which we reconciled the codings. For a handful of very difficult samples (in some instances because of poor sound quality), we reconciled the codings ourselves in the course of the analysis necessary for this paper.

The next step was to code for the independent variables we predicted might condition the alternations for the variable speakers. In the present paper, we report our findings on stylistic conditioning for two speakers recorded in all three periods between 1971 and 1995.

#### 4. Individual [r] ~ [ʀ] variability

A first question to be asked is how typical is intra-individual variation? To provide a general assessment of this question, we examined all the speech samples (124) we have coded for (r) variability across all time periods.


Table 1 – All speech samples that form the pool for studying the conditioning of (r) variability.

Since the general findings on change in progress led us to expect incremental change throughout the community, we were surprised to discover that the majority of speakers tend toward categorical use of one of the two variants. Eighty-three of the 124 speech samples (that is, 67%) exhibit categorical or nearcategorical behavior on the part of the speakers (if near-categorical is defined as within 10 percentage points of 0% or 100%). Clermont & Cedergren's findings on the entire 1971 sample had also revealed most of the speakers to be close to 0% or 100%, but we would have assumed that a real-time comparison would show more intermediate speakers, if the change progressed incrementally.

Most of the near-categorical speakers of 1971 stayed that way in 1984, but a majority of the variable speakers moved towards categoriality. In Sankoff & Blondeau (2007), we divided our 32-speaker panel into 'low', 'intermediate' and 'high' users of the innovative [ʀ] variant in 1971. Only 2 of the 12 'low' range users of [ʀ] in 1971 had moved into the 'intermediate' range by 1984. On the other hand, most of the 'intermediate' speakers of 1971 had moved into the 'high' range by 1984. That category increased from 12 to 18 speakers by 1984, with more than half of the panelists now having become categorical or near-categorical users of innovative [ʀ]. From the point of view of individuals, then, it seems that being in the intermediate range of [r] ~ [ʀ] variability is a very unstable state, with most intermediate range speakers moving to categoriality over their lifetimes.

Of the two speakers selected for the study of stylistic conditioning in the current paper, one (André L.) was in the intermediate range over all three time periods, whereas the other (Lysiane B.) was a virtually categorical user of the apical variant in 1971, and showed considerable change later in life.

#### 4.1 Stylistic conditioning of [r] ~ [ʀ] variability

The question addressed in this paper is whether speakers who have adopted the innovative [ʀ] in variation with the traditional [r] also show sensitivity to stylistic considerations. Innovative [ʀ] is a change from above, higher values of being associated with women and with higher linguistic market indices (Sankoff et al. 2001). Thus, it is reasonable to investigate whether speakers associate [ʀ] with formal style, or youth, or women, or higher social class, and on the other hand, whether they associate [r] with being old or old-fashioned, or with intimacy or informality. We have modeled the change as one in which many speakers would have acquired [r] in primary acquisition in the family setting, adopting [ʀ] later in childhood or adolescence under the influence of peers (Sankoff & Blondeau 2007). Thus stylistically, it is possible that speakers who have made such a change over their own lifetimes will associate the [r] variant with family and their own childhood.

Of all the middle-range speakers, we chose two of those who were followed across the 24-year time span of the study for stylistic analysis. Both in their twenties in 1971, they belonged to the first generation of speakers who were at that time adopting innovative [ʀ] as their basic consonantal variant. This was, however, more typical of middle and upper-middle class speakers (Sankoff et al. 2001), and the two we follow here were from working-class backgrounds.

Lysiane B. (#7) at age 24 in 1971 was newly married, a factory worker who had not finished high school, but she and her husband were already planning a home in the suburbs and a better life for their family. As described in Blondeau et al. (2002), Lysiane by 1984 had forged a career in sales, and she, her husband and young daughter were indeed living in their suburban home. By 1995, Lysiane had become a realtor, and projected self-confidence in her own mastery of her course in life, as well as pride in her daughter's accomplishments.

André L. (#65) was 27 in 1971, single, and working in his chosen profession as an actor. He talks of his working-class father's aspirations for his children to achieve white-collar status with some job security, but explains how he himself (having finished high school, and recently graduated from a prestigious acting school) prefers living on a limited income with a meaningful profession. At 40, married with a toddler and a new baby, he was still following this financially unrewarding career path in 1984. By 1995, however, he had had to give up on acting and find a more certain source of income, and had shifted, as he explained in his interview, to gerontology, working as an *animateur* in a facility for senior citizens. Even with both himself and his wife working full time, he talks of financial worries supporting a family that now includes a teenager who needs music lessons. Despite these problems, André is clearly someone who finds much satisfaction in both his work and family life.

What kind of diachronic trajectories do these two speakers have? For Lysiane, her dramatic upward social mobility seems to go hand in hand with a dramatic rise in her use of the innovative [ʀ], from only 7% in 1971 to 65% in 1984, after which she steadily but more slowly continues to increase, registering a value of 75% [ʀ] in 1995 (a statistically significant increase between 1984 and 1995). André, in contrast, was already a middle-range user of [ʀ] in 1971. Though the overall values of [ʀ] reported for him increase slightly, from 61% [ʀ] in 1971, to 66% in 1984, to 69% in 1995, these slight increases were not statistically significant, leading us to conclude that André has been a stable mid-range user of [ʀ] over the 24-year period of the study (a pattern atypical of our sample as a whole).

To study stylistic variation, we increased the sample size for both these speakers, and searched as well for portions of their interviews that might be likely to show the most different behavior. Both speakers showed stylistic variation, but in different ways. Since Lysiane had close to categorical use of [r] in 1971, with only 7% [ʀ], our stylistic analysis deals with her in 1984 and 1995, and André in 1971, 1984 and 1995.

The results for Lysiane are reported in Table 2. We first studied three segments in her 1984 interview. We expected that a segment in which she recounts a conflict with the administration of her daughter's school might yield a higher rate of [ʀ] than she uses in discussing more mundane topics, and this did prove to be the case. However, we also expected that she might show a significantly lessened use of [ʀ] in the most emotional segment of the recording, one in which she narrates her daughter's harrowing experience with a near-fatal illness. If Lysiane's use of [r] still represents her 'vernacular' in the sense of its being her dominant form throughout childhood and up through at least the age of 24, we reasoned that this very emotional story might lead her toward more vernacular usage. However, [ʀ]-usage in this segment was not significantly different from its use in Lysiane's recounting of more mundane family history as shown in the first part of Table 2. Only in segment C is [ʀ] use significantly different from – in this case more frequent than – the other two segments (whether considered separately or combined).


Table 2 – [ʀ] and [r] use by topic for Lysiane B. in 1984 and 1995. Tokens of [r] and [ʀ] add to less than the total coded since non-consonantal variants included there did not enter into the percentage calculations.

How can we explain why Lysiane's behavior did not match our expectations in this regard? It may be that we misanalyzed the stylistic nature of segment B – for example, some of it concerns Lysiane's dealing with doctors and hospital authorities, figures who may be parallel to the school authorities in segment C. However, separating this long segment into – for example – the utterances revealing Lysiane's emotional responses from those involving reported conversations with authorities, did not reveal any particular patterning in her use of the two variants. For example, in (1), her use of [ʀ] and [r] shows a preference for using [ʀ] in codas and [r] in onsets4 , but does not obey any stylistic constraints we could identify. (Other coda r's in clusters in this example were deleted and thus did not enter into the alternation at issue here).

<sup>4</sup> This is a general tendency we have identified for almost all of the 'mid-range' variable speakers we have analyzed, as discussed in Sankoff & Blondeau (in preparation).


A more likely interpretation of these results is that apical [r] is no longer Lysiane's unmarked, vernacular pronunciation of (r). In her case, it seems that posterior [ʀ] may yet carry the general implication of a pronunciation associated with authority, education, and formality. The one subsection of her encounter with the school administration in which [ʀ] co-occurs with a hyper-formal5 (and hypercorrect) form is in (2). When Lysiane confronts her daughter's teacher about the lunch policy, asking her who exactly set the policy, the teacher's answer is reported as containing a liaison with infinitival [ʀ] – in a sentence where it would probably have been the past participle which was used. Lysiane continues to report herself as having replied with another infinitival [ʀ]. It would seem almost impossible to have scored this rhetorical coup using an apical [r] in the liaison, yet her emphasis here is on the fact of the liaison itself and not the particular variant of (r) used.


Overall, however, what we see with Lysiane is that the formal passage contains 75% [ʀ] use, without any individual subsections being particularly marked with [ʀ] – perhaps difficult to do when what would be so *marked* would be the statistically unmarked form. Yet nor did the words in which [r] occurred here – or in her other passages – appear to be stylistically marked in any way. Lysiane raises her overall level of [ʀ] use in dealing with a topic marked by formality,

<sup>5</sup> Other hyper-formal elements in this short sentence include the object clitic *en* and the use of the first person plural verbal suffix *–ons* (where normally one would find *Suzanne et moi, on a décidé).*

yet individual tokens are not associated with a particular stylistic force. This resembles the situation for the negation in French where *ne* is associated with formality without being used all the time in formal contexts (Sankoff & Vincent 1977).

In 1995, we again studied three segments from Lysiane as illustrated above in Table 2. In none of these segments does [ʀ] use differ significantly from the others. Her discussion of business decisions and difficulties with opening a dress shop (segment F) shows [ʀ] use on a parallel with her 1984 segment on conflict with the authorities at her daughter's school. However, segments D and E, chosen to tap into Lysiane's most unself-conscious speech, showed [ʀ] usage that is not significantly different from segment F. In D, she recounts how her mother was not happy living with Lysiane's family after being widowed, and in E (a passage which begins so emotionally that the tape recorder was turned off for a few minutes), she tells of her grandmother's death. Both of these passages seem to confirm that [ʀ] is now part of Lysiane's vernacular. This time, there are a few tokens of apical [r] in words that carry an ironic flavor, especially in the segment about conflict with her mother, but overall, stylistic variation seems not to be characteristic of Lysiane's use of [ʀ] ~ [r] in 1995.

André is a different story. Though stable across time, André's use of (r) variation seems more closely keyed to the use of individual tokens. Classified as a Middle Class speaker due to his high position on the linguistic market index counterbalancing for his working-class family background, André was an interesting case to study. Born in 1944 with several older siblings, we assume from his family background that André also acquired [r] in his primary language acquisition. However, he is unusual in having undergone training as an actor that included specific attention on the part of teachers and coaches from France whose mission it was to teach the Québécois actors to lose their local accents and speak 'international' French. In both 1971 and 1984, André speaks at length about his profession and in these segments, [r] is almost entirely absent, as shown in Table 3. Segment C differs significantly from A and B in 1971; Segment F in 1984 is virtually the same as the corresponding stylistic segment in 1971, and differs significantly from Segments D and E. In 1995, André was no longer working as an actor and did not talk about the theatre: Segment I, the most formal topic he discussed, differs significantly from G and H.


Table 3 – [ʀ] and [r] use by topic for André L. in 1971, 1984 and 1995.

It is clear that André's stylistic range is greater than that of Lysiane. In the sections devoted to discussion of the theatre, alveolar [r] is almost completely absent. In these sections, he appears to use individual tokens of [r] for stylistic effect, reminiscent of Gumperz' (1982) analysis of metaphorical code-switching. In one 3 1/2 minute segment from section C, there are only 3 alveolar tokens in an otherwise uninterrupted sequence of 80 posterior tokens. Two of the three occur in (3), where André switches from the exaggerated 'French French' accent he adopts for the words *l'accent fʀançais* in the second line, to the common Québécois expression *s'énerver ben gros*. Both in *énerver* and in *gros*, André uses an apical [r], co-occurring with the usual pronunciation of *bien* without the glide whenever it is used in (this nonstandard) adverbial function.


Later on he uses another expression clearly part of the Québécois vernacular when once more he evaluates himself, this time from the standpoint of some of his ambitious theatre school classmates, in the midst of a segment that is otherwise entirely characterized by posterior [ʀ].


Here the phrase *ben ben brillant* uses the colloquial evaluative adverbial "ben ben", never pronounced with a glide, along with the unique use of apical [r] in *brillant*.

André's use of the traditional Montreal [r] continues, albeit in a less unequivocal form, to be stylistically marked in other discourse that includes a much lower rate of [ʀ] use. He tends to use a higher rate of alveolar [r] in contexts referring to the family and to childhood, whether his own or that of his own children. For example, in 1984, after a set of rather impersonal reflections on why he prefers country living to the city, featuring mainly posterior [ʀ], he suddenly mentions the concrete experience of cross-country skiing with his toddler, saying that he loves to go out with his son on days when he doesn't have to work (all words in bold characters feature alveolar [r]):



Like Lysiane, in discussions of family history André intersperses apical and posterior variants throughout. But whereas for her, even a mini-concentration of three or four apical r's in a row does not in itself seem to carry any emotional association, in André's speech apical (r) often appears to cluster in utterances (though not necessarily particular words) that are especially imbued with emotion. These are usually positive but sometimes have a wryly ironic flavor as in (3) and (4) above.

By 1995, André has left the theatre and nowhere does there occur a context in his interview in which he uses [ʀ] as exclusively as in 1971 and 1984. His discussion of his work and of politics in segment I produces only 80% [ʀ], a significant decline from the formal contexts of 1971 and 1984. However, we feel certain that were André once more to talk about his acting career, we would see the same more extreme stylistic range he demonstrated earlier in his life.

Our interpretation of these results from André is that, as a trained actor who has been made sharply aware of dialect differences, he probably represents the upper limit of speakers' ability to deploy the two (r) variants stylistically. This stylistic differentiation for André may be part of the explanation for why he remains a variable speaker and does not show evidence of an overall increase in [ʀ] between the age of 27 and 51. That stylistic variation rather than change over time is important for André can be seen from Figure 1, which plots all of André's segments by topic and year. Between age 27 and 51, André maintains a fairly consistent overall level of [ʀ] in the range of 65% - 70%, but he also maintains clear stylistic differences as can be seen in Table 3 above. His phonological conditioning also remains stable, with codas yielding slightly higher percentages than onsets throughout – a much lesser difference than he exhibits in stylistic range.

Figure 1 – Percentage of [R] for André L. by topic and year.

Like Lysiane, in discussions of family history André intersperses apical and posterior variants throughout. But whereas for her, even a mini-concentration of three or four apical r's in a row does not in itself seem to carry any emotional association, in André's speech apical (r) often appears to cluster in utterances (though not necessarily particular words) that are especially imbued with emotion. These are usually positive but sometimes have a wryly ironic

By 1995, André has left the theatre and nowhere does there occur a context in his interview in which he uses [R] as exclusively as in 1971 and 1984. His discussion of his work and of politics in segment I produces only 80% [R], a significant decline from the formal contexts of 1971 and 1984. However, we feel certain that were André once more to talk about his acting career, we would see the same more extreme stylistic range he demonstrated earlier in his life. Our interpretation of these results from André is that, as a trained actor who has been made sharply aware of dialect differences, he probably represents the upper limit of speakers' ability to deploy the two (r) variants stylistically. This stylistic differentiation for André may be part of the explanation for why he remains a variable speaker and does not show evidence of an overall increase in [R] between the age of 27 and 51. That stylistic variation rather than change over time is important for André can be seen from Figure 1, which plots all of André's segments by topic and year. Between age 27 and 51, André maintains a fairly consistent overall level of [R] in the range of 65% - 70%, but he also maintains clear stylistic

flavor as in (3) and (4) above.

difference than he exhibits in stylistic range.

Figure 1 – Percentage of [ʀ] for André L. by topic and year.

170 Comparing Lysiane with André, both speakers from a working-class background, we made the assumption that both acquired [r] as children. This was based on the fact that although some middle- and upper-class speakers in their 20s in 1971 tended to use [R] as their vernacular form, most working-class speakers were predominant users of [r] (Sankoff et al. 2001). When we met Lysiane in 1971, this was still her pattern at age 24. André at 25, with his theatre-school experience behind him, already showed a great stylistic range and a vernacular pattern in which the two forms were in variation. Over the next 24 years, Lysiane's upward social mobility was accompanied by a dramatic increase in her use of [R], but she shows only slight stylistic conditioning in 1984, and none in 1995 when [R] seems to have Comparing Lysiane with André, both speakers from a working-class background, we made the assumption that both acquired [r] as children. This was based on the fact that although some middle- and upper-class speakers in their 20s in 1971 tended to use [ʀ] as their vernacular form, most workingclass speakers were predominant users of [r] (Sankoff et al. 2001). When we met Lysiane in 1971, this was still her pattern at age 24. André at 25, with his theatre-school experience behind him, already showed a great stylistic range and a vernacular pattern in which the two forms were in variation. Over the next 24 years, Lysiane's upward social mobility was accompanied by a dramatic increase in her use of [ʀ], but she shows only slight stylistic conditioning in 1984, and none in 1995 when [ʀ] seems to have replaced [r] in her vernacular. André on the other hand has not experienced upward social mobility and has not changed over time, but continues to show stylistic conditioning.

#### 5. Conclusions

To date, there have been relatively few panel studies in which data on individuals has been reported over the span of a decade or more. Looking at vowel systems, Brink & Lund (1979); and Labov & Auger (1998) have shown stability in individual speakers, similar to the roughly 2/3 of our speakers who were stable across time. The majority of vowels of the speaker studied by Prince (1987, 1988) were also stable over 4 decades. In the domain of morphology, research on Montreal French auxiliary selection has shown stability in all but one or two of 60 speakers between 1971 and 1994, in the face of community change toward the use of *être* (Sankoff et al. 2004). Further work on the alternation between periphrastic and inflected future has found stability for the majority of the same 60 panel speakers, with upper class speakers showing retrograde change, increasing their use of the inflected future across their adult lives (Wagner & Sankoff 2011). In the alternation between *a gente* and the first person plural in Portuguese, Zilles (2005) reports that 11 of 13 speakers in a panel study across roughly two decades were stable in their use of *a gente* to replace the first person plural in Portuguese; the other two speakers showing retrograde change over their lifespans. Ashby (2001) reports that of 10 French speakers followed across a 19-year period, 6 were stable in their use of *ne*-deletion. Of the remaining four, three reduced their use of *ne* (the direction of community change) and one was anomalous in her increased use. A study of noun phrase agreement in Portuguese has also shown that across two decades, a sizeable minority of speakers (5 of 16) substantially increased their use of agreement – the direction of community change (Naro & Scherre 2002).

Taken together, these panel studies demonstrate that although speaker stability in adult life seems to be the majority pattern, we frequently find a sizeable minority of speakers dramatically increasing their use of the innovative variant, with small minorities becoming more conservative as they age.

Several of these studies have, like our study of Montreal (r), included a larger trend component along with a study of a subset of speakers as a panel. The studies of Ashby (2001), Naro & Scherre (2002), and Zilles (2005) concur with our research in two important respects: (1) community change outpaces that of individual speakers across their lifespans; and (2) in all these cases where options are binary, with no intermediate forms involved, change for individual speakers is often quite dramatic. It is possible that the fact that options are binary and discrete, in the [r] → [ʀ] case as in the morphological alternations, makes possible the abrupt and rapid character of the change, as opposed to the slow and incremental nature of many of the vocalic changes described in previous research.

The implications of these results in terms of whether individual grammar change in adult life is a matter only of quantitative change, or whether qualitative change is involved, is the topic of Sankoff (in preparation). What we can reliably say on this point at present is that most of those speakers who changed from *intermediate range* use of [ʀ] to categorical or virtually categorical use also went from a grammar where onsets and codas differentially conditioned (r) variation to a grammar that lacked this conditioning.

Our analysis in this paper has concentrated on the middle phase of a very rapid change, investigating the stylistic conditioning of the variation. The sensitivity to stylistic conditioning has appeared to be complex, as illustrated by the detailed analysis of the alternation for two speakers across the lifespan. Those two speakers who acquired the apical variant as children are not equally sensitive to the stylistic environment. Our analysis has shown that one of the two speakers already manipulated the alternation of the variants for stylistic purposes at the age of 25 in 1971 due to his personal background as an actor, and maintained this ability in later life. However, the other speaker, who was still using her vernacular [r] pattern at the age of 24 in 1971, changed dramatically toward [ʀ], probably due at least in part to her upward social mobility, without having showed a clear stylistic manipulation of the variants. In her case, it seems that one variant has replaced the other as the default variant. Further research using a combination of trend and panel study needs to be done on other variables involved in the process of change if we want to better understand the relation between stylistic markedness and the process of change.

#### References


(eds.), *'r-atics: Sociolinguistic, phonetic and phonological characteristics of /r/*, 141-157*.* Brussels: ILVP.


## List of contributors

#### Mary Baltazani

Mary Baltazani is an Assistant Professor and the Director of the Phonetics Laboratory in the Department of Philology at the University of Ioannina, Greece. She holds a PhD (2002) in Linguistics (UCLA). Her research interests are in phonetics, laboratory phonology, dialectology, intonation and pragmatics, the interface of phonetics with phonology, as well as the interface of intonation with pragmatics, semantics and syntax.

#### Štefan Be**ň**uš

Štefan Beňuš received his PhD in linguistics from the New York University in 2005. He has been examining the phonetics-phonology interface in various aspects of Slovak. The second major area of expertise is the relationship between speech prosody and pragmatic/discourse aspects of the message as well as the emotional state of the speaker delivering the message.

#### Hélène Blondeau

Hélène Blondeau is Associate Professor of French and Francophone Studies and Linguistics at the University of Florida. Her research interests encompass language variation and change as well as language contact and bilingualism. She has published on sociolinguistic variation and language practices of Francophone communities in North America, including a 2011 book on pronominal variation in Quebec French.

#### Lasse Bombien

Lasse Bombien is a research scientist at the Institute of Phonetics and Speech Processing, Munich University and currently visiting scholar at the Linguistics Department, USC Los Angeles. His main research interests include gestural coordination across articulatory tiers, techniques for investigating speech kinematics and the development of software for acquiring and analyzing speech data.

#### Evan-Gary Cohen

Evan Cohen lectures on phonology and phonetics at Tel Aviv University. Evan's research focuses on the phonology-phonetics interface, relying on raw acoustic data in formal phonological frameworks. His primary fields of research include various aspects of loanword adaptation, acquisition, vowel perception and production, and a variety of phenomena pertaining to the Hebrew rhotic.

#### Philip Hoole

Philip Hoole is Senior Lecturer at the Institute of Phonetics and Speech Processing, Munich University. His main research interests include linguistic phonetics, speech motor control and laryngeal articulation, with special emphasis on instrumental studies of articulatory coordination.

#### Ghada Khattab

Ghada Khattab is a phonetics lecturer at Newcastle University. Her research interests include laboratory phonology, Arabic phonetics and phonology, monolingual and bilingual phonological acquisition, and sociophonetics, particularly in relation to accent (dialect) acquisition by bilinguals.

#### Katerina Nicolaidis

Dr. Katerina Nicolaidis is an Associate Professor at the Department of Theoretical and Applied Linguistics, School of English, Aristotle University of Thessaloniki. She holds a PhD in Phonetics from the University of Reading, UK. She is the director of the Phonetics Laboratory of the School of English. She has been the Vice-President of the International Phonetic Association (IPA) and President of the Permanent Council for the Organisation of the International Congress of Phonetic Sciences since 2011. She also served as Secretary of the IPA during 2003- 2011. Her research interests are in the area of experimental phonetics. She has worked for several research projects and has carried out research in normal and disordered speech production, phonological acquisition, coarticulation, articulatory variability in different speaking styles, speech production in noise, and methodology of teaching pronunciation.

#### Cédric Patin

Cédric Patin completed his PhD in Linguistics at Université Paris 3 in 2007. His thesis examined the tonal system of the Bantu language Shingazidja. After a postdoctorate at the Laboratoire de Linguistique Formelle (CNRS/Université Paris 7), where he worked on the prosodygrammar interface, he accepted the position of Maître de conférences en phonétique et phonologie du français at Université Lille 3 in 2009. His work focuses on the phonology of Bantu languages, with emphasis on the prosody-syntax interface.

#### Marianne Pouplier

Marianne Pouplier is Senior Researcher at the Institute of Phonetics and Speech Processing, Ludwig-Maximilians University Munich. Her main research interests include phonetics, speech production and the phoneticsphonology interface.

#### Reenu Punnoose

Dr. Reenu Punnoose completed her PhD titled "An auditory and acoustic study of liquids in Malayalam" from Newcastle University in 2011. Her research interests include acoustic phonetics, articulatory phonetics, Dravidian phonetics and phonology, sociophonetics, bilingualism and second language acquisition.

#### María Riera

María Riera is a PhD candidate in the Department of English and German Studies at Universitat Rovira i Virgili, Tarragona, Spain, where she also teaches English language and phonetics/phonology courses. Her research interests are acoustic phonetics, speech production and perception, the phonetics-phonology interface, sound change and the teaching of English pronunciation to speakers of Spanish/Catalan.

#### Antonio Romano

Antonio Romano is Tenure Track Researcher at the University of Turin (Italy). Since 2006, he is the responsible of the scientific activities of the LFSAG (Laboratory of Experimental Phonetics "Arturo Genre") of the same University. He is associated to the co-ordination of the AMPER project (Atlas Multimédia Prosodique de l'Espace Roman) (with Michel Contini, Univ. of Grenoble, France) and is a member of the direction board of the AISV (Associazione Italiana di Scienze della Voce - Italian Association of Voice Sciences).

#### Joaquín Romero

Joaquín Romero. Master of Arts (1992) and Ph.D. (1995) in Linguistics, University of Connecticut. Associate Professor at Universitat Rovira i Virgili, Tarragona, Spain (1996 to present). Research interests include articulatory phonetics and phonology, the phonetics-phonology interface, sound change and English pronunciation teaching methodology.

#### Gillian Sankoff

A native of Montreal, Gillian Sankoff studied anthropology and languages at McGill University. She taught in the Anthropology Department of the Université de Montréal from 1968-79, where with David Sankoff and Henrietta Cedergren, she designed and carried out a major sociolinguistic study of Montreal French. That study was repeated in the 1980s and 1990s by former students, and this longitudinal corpus is now the basis for her current trend and panel studies of change and variation. Like the paper with Hélène Blondeau in this volume, much of her longitudinal research results from joint work with others who have been involved in the project over the decades. Since 1979, she has been a member of the Linguistics Department at the University of Pennsylvania.

#### Carmen-Florina Savu

Carmen-Florina Savu is a Linguistics M.A. graduate from the University of Bucharest. Her research interests include the Phonetics-Phonology interface, and more specifically the ability of liquids to behave as syllabic consonants in the Slavic languages.

#### James M. Scobbie

Professor James M. Scobbie trained as a phonologist at the University of Edinburgh, and now is Director of the CASL Research Centre at Queen Margaret University. The centre specialises in articulatory phonetic research, primarily with the goal of improving the diagnosis and treatment of speech disorders through visual biofeedback of speech articulation in real time.

#### Lorenzo Spreafico

Lorenzo Spreafico is non-tenure-track researcher at the Language Study Unit, Free University of Bozen-Bolzano. His current research interests include monolingual, bilingual and L2 phonological acquisition, with special emphasis on articulatory phonetics.

#### Nasir A. Syed

Nasir A. Syed is Assistant Professor of English at the Department of English Language and Literature Lasbela University Uthal, Balochistan (Pakistan). He studied theoretical phonology and second language acquisition from University of Essex, United Kingdom. He is interested in the study of Pakistani English (PE) and indigenous Indo-Aryan languages spoken in Pakistan. His main areas of research include L1 & L2 phonology, socio-phonetics, applied linguistics and historical linguistics.

#### Evie Tops

Evie Tops obtained her PhD at the Vrije Universiteit Brussel, with a sociophonetic study of /r/ in Flanders. She is now a lecturer in Dutch language and linguistics at the Université Libre de Bruxelles. Her current research focuses on institutional and ethnolinguistic aspects of the teaching of Dutch as a foreign language.

#### Marijn van 't Veer

Marijn van 't Veer is a PhD candidate at the Leiden University Centre for Linguistics. He is writing a dissertation on the acquisition of the phonological inventory. Apart from his interest in the cross-over of acquisition and theoretical phonology, research interests include rhotics, and sonorants and sonority in general.

#### Roeland van Hout

Roeland van Hout is professor of applied and variation linguistics at the Radboud University Njjmegen (Centre for Language Studies). His work focuses on language variation and change and second language acquisition, from an interdisciplinary perspective (sociology, linguistics, psychology). He also publishes on the application of statistics in language research. With Hans Van de Velde, he co-founded the *'r-atics* conferences.

#### Hans Van de Velde

Hans Van de Velde is senior lecturer in sociolinguistics at Utrecht University (Utrecht Insitute of Linguistics OTS). His research mainly focuses on phonetic variation and change and the patterns of convergence and divergence in the Dutch language area. He also got involved in studies on urbanization, language contact and identity formation in China. During his dissertation he got fascinated by the extreme variability of /r/ in Dutch and co-founded with Roeland van Hout the *'r-atics* conferences.

#### Alessandro Vietti

Alessandro Vietti is a tenured researcher in Linguistics at the Free University of Bozen-Bolzano and Director of the Laboratory of Experimental Phonetics. His main research fields are laboratory phonology and sociophonetics, and more specifically his work focuses on language contact and bilingualism (with emphasis on Italian and German dialects).

#### ALP - Alpine Laboratory of Phonetics

www.unibz.it/labphon

This book provides an insight into the patterns of variation and change of rhotics in different languages and from a variety of perspectives. It sheds light on the phonetics, the phonology, the sociolinguistics and the acquisition of /r/-sounds in languages as diverse as Dutch, English, French, German, Greek, Hebrew, Italian, Malayalam, Romanian, Saraiki, Slovak, Tyrolean and Washili Shingazidja, thus contributing to the discussion on the unity and uniqueness of this group of sounds.

36,00 Euro

www.unibz.it/universitypress